“Bilinear Spatiotemporal Basis Models” by Akhter, Simon, Khan, Matthews and Sheikh

  • ©Ijaz Akhter, Tomas Simon, Sohaib Khan, Iain Matthews, and Yaser Sheikh




    Bilinear Spatiotemporal Basis Models



    A variety of dynamic objects, such as faces, bodies, and cloth, are represented in computer graphics as a collection of moving spatial landmarks. Spatiotemporal data is inherent in a number of graphics applications including animation, simulation, and object and camera tracking. The principal modes of variation in the spatial geometry of objects are typically modeled using dimensionality reduction techniques, while concurrently, trajectory representations like splines and autoregressive models are widely used to exploit the temporal regularity of deformation. In this article, we present the bilinear spatiotemporal basis as a model that simultaneously exploits spatial and temporal regularity while maintaining the ability to generalize well to new sequences. This factorization allows the use of analytical, predefined functions to represent temporal variation (e.g., B-Splines or the Discrete Cosine Transform) resulting in efficient model representation and estimation. The model can be interpreted as representing the data as a linear combination of spatiotemporal sequences consisting of shape modes oscillating over time at key frequencies. We apply the bilinear model to natural spatiotemporal phenomena, including face, body, and cloth motion data, and compare it in terms of compaction, generalization ability, predictive precision, and efficiency to existing models. We demonstrate the application of the model to a number of graphics tasks including labeling, gap-filling, denoising, and motion touch-up.


    Akhter, I., Sheikh, Y., Khan, S., and Kanade, T. 2008. Nonrigid structure from motion in trajectory space. In Advances in Neural Information Processing Systems.Google Scholar
    Akhter, I., Sheikh, Y., Khan, S., and Kanade, T. 2010. Trajectory space: A dual representation for nonrigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. Google ScholarDigital Library
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., and Davis, J. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3, 408–416. Google ScholarDigital Library
    Arikan, O. 2006. Compression of motion capture databases. ACM Trans. Graph. 25, 3, 890–897. Google ScholarDigital Library
    Bregler, C., Hertzmann, A., and Biermann, H. 2000. Recovering non-rigid 3D shape from image streams. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 690–696.Google Scholar
    Bronstein, A., Bronstein, M., and Kimmel, R. 2008. Numerical Geometry of Non-Rigid Shapes. Springer. Google ScholarDigital Library
    Chai, J. and Hodgins, J. K. 2005. Performance animation from low-dimensional control signals. ACM Trans. Graph. 24, 3, 686–696. Google ScholarDigital Library
    Chuang, E. and Bregler, C. 2005. Mood swings: Expressive speech animation. ACM Trans. Graph. 24, 2, 331–347. Google ScholarDigital Library
    Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. 1995. Active shape models—Their training and application. Comput. Vis. Image Understand. 61, 1, 38–59. Google ScholarDigital Library
    de Aguiar, E., Sigal, L., Treuille, A., and Hodgins, J. K. 2010. Stable spaces for real-time clothing. ACM Trans. Graph. 29, 4, 106:1–106:9. Google ScholarDigital Library
    de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.-P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 98:1–98:10. Google ScholarDigital Library
    DeAngelis, G. C., Ohzawa, I., and Freeman, R. D. 1995. Receptive-field dynamics in the central visual pathways. Trends Neurosci. 18, 10, 451–458.Google ScholarCross Ref
    Deboor, C. 1978. A Practical Guide to Splines. Springer.Google Scholar
    Dryden, I. L. and Mardia, K. V. 1998. Statistical Shape Analysis. Wiley.Google Scholar
    Gabaix, X. and Laibson, D. 2008. The seven properties of good models. In The Foundations of Positive and Normative Economics: A Handbook, A. Caplin and A. Schotter, Eds., Oxford University Press.Google Scholar
    Gleicher, M. 1997. Motion editing with spacetime constraints. In Proceedings of the Symposium on Interactive 3D Graphics. 139–148. Google ScholarDigital Library
    Gleicher, M. 1998. Retargetting motion to new characters. In Proceedings of SIGGRAPH. Annual Conference Series. 33–42. Google ScholarDigital Library
    Gleicher, M. 2001. Comparing constraint-based motion editing methods. Graph. Models 63, 2. Google ScholarDigital Library
    Gleicher, M. and Litwinowicz, P. 1998. Constraint-Based motion adaptation. J. Vis. Comput. Anim., 65–94.Google ScholarCross Ref
    Gotardo, P. F. and Martinez, A. M. 2011. Computing smooth time trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2051–2065. Google ScholarDigital Library
    Hamarneh, G. and Gustavsson, T. 2004. Deformable spatio-temporal shape models: Extending active shape models to 2D+time. Image Vis. Comput. 22, 6, 461–470.Google ScholarCross Ref
    Herda, L., Fua, P., Plankers, R., Boulic, R., and Thalmann, D. 2001. Using skeleton-based tracking to increase the reliability of optical motion capture. Hum. Move. Sci. 20, 3, 313–341.Google ScholarCross Ref
    Hoogendoorn, C., Sukno, F., Orděs, S., and Frangi, A. 2009. BiLinear models for spatio-temporal point distribution analysis. Int. J. Comput. Vis. 85, 237–252. Google ScholarDigital Library
    Hornung, A., Sar-Dessai, S., and Kobbelt, L. 2005. Self-calibrating optical motion tracking for articulated bodies. In Proceedings of Virtual Reality Conference (VR). IEEE, 75–82. Google ScholarDigital Library
    Jain, A. 1989. Fundamentals of Digital Image Processing. Prentice-Hall, Upper Saddle River, NJ. Google ScholarDigital Library
    Lawrence, N. D. 2004. Gaussian process latent variable models for visualisation of high dimensional data. In Advances in Neural Information Processing Systems.Google Scholar
    Le, H. and Kendall, D. G. 1993. The riemannian structure of euclidean shape spaces: A novel environment for statistics. Ann. Statist 21, 3, 1225–1271.Google ScholarCross Ref
    Lewis, J. P. and Anjyo, K.-i. 2010. Direct manipulation blendshapes. IEEE Comput. Graph. Appl. 30, 4, 42–50. Google ScholarDigital Library
    Li, H., Weise, T., and Pauly, M. 2010a. Example-Based facial rigging. ACM Trans. Graph. 29, 4, 32:1–32:6. Google ScholarDigital Library
    Li, L., McCann, J., Faloutsos, C., and Pollard, N. 2010b. Bolero: A principled technique for including bone length constraints in motion capture occlusion filling. In Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Google ScholarDigital Library
    Li, L., McCann, J., Pollard, N. S., and Faloutsos, C. 2009. Dynammo: Mining and summarization of coevolving sequences with missing values. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 507–516. Google ScholarDigital Library
    Liu, G. and McMillan, L. 2006. Estimation of missing markers in human motion capture. Vis. Comput. 22, 721–728. Google ScholarDigital Library
    Lou, H. and Chai, J. 2010. Example-based human motion denoising. IEEE Trans. Vis. Comput. Graph. 16, 870–879. Google ScholarDigital Library
    Magnus, J. R. and Neudecker, H. 1999. Matrix Differential Calculus with Applications in Statistics and Econometrics, 2nd Ed. John Wiley & Sons.Google Scholar
    Mardia, K. V. and Dryden, I. L. 1989. Shape distributions for landmark data. Adv. Appl. Probab. 21, 4, 742–755.Google ScholarCross Ref
    Min, J., Chen, Y.-L., and Chai, J. 2009. Interactive generation of human animation with deformable motion models. ACM Trans. Graph. 29, 1, 9:1–9:12. Google ScholarDigital Library
    Min, J., Liu, H., and Chai, J. 2010. Synthesis and editing of personalized stylistic human motion. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games. 39–46. Google ScholarDigital Library
    Mitchell, S., Bosch, J., Lelieveldt, B., van der Geest, R., Reiber, J., and Sonka, M. 2002. 3-D active appearance models: Segmentation of cardiac MR and ultrasound images. IEEE Trans. Med. Imaging 21, 9, 1167–1178.Google ScholarCross Ref
    Park, S. I. and Hodgins, J. K. 2006. Capturing and animating skin deformation in human motion. ACM Trans. Graph. 25, 3, 881–889. Google ScholarDigital Library
    Perperidis, D., Mohiaddin, R., and Rueckert, D. 2004. Spatio-Temporal free-form registration of cardiac MR image sequences. In Medical Image Computing and Computer-Assisted Intervention, C. Barillot, D. R. Haynor, and P. Hellier, Eds. Lecture Notes in Computer Science, vol. 3216. Springer, 911–919. Google ScholarDigital Library
    Rao, K. and Yip, P. 1990. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic, New York. Google ScholarDigital Library
    Safonova, A., Hodgins, J. K., and Pollard, N. S. 2004. Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces. ACM Trans. Graph. 23, 3, 514–521. Google ScholarDigital Library
    Schölkopf, B., Smola, A. J., and Müller, K.-R. 1997. Kernel principal component analysis. In Proceedings of the International Conference on Artificial Neural Networks. 583–588. Google ScholarDigital Library
    Sidenbladh, H., Black, M. J., and Fleet, D. J. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proceedings of the European Conference on Computer Vision. 702–718. Google ScholarDigital Library
    Sigal, L., Fleet, D., Troje, N., and Livne, M. 2010. Human attributes from 3D pose tracking. In Proceedings of the European Conference on Computer Vision. 243–257. Google ScholarDigital Library
    Sunkavalli, K., Matusik, W., Pfister, H., and Rusinkiewicz, S. 2007. Factored time-lapse video. ACM Trans. Graph. 26, 3, 101:1–101:10. Google ScholarDigital Library
    Tenenbaum, J. B. and Freeman, W. T. 2000. Separating style and content with bilinear models. Neural Comput. 12, 1247–1283. Google ScholarDigital Library
    Thrun, S., Burgard, W., and Fox, D. 2006. Probabilistic Robotics. Cambridge University Press.Google Scholar
    Torresani, L. and Bregler, C. 2002. Space-Time tracking. In Proceedings of the European Conference on Computer Vision. 801–812. Google ScholarDigital Library
    Troje, N. F. 2002. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. J. Vis. 2, 5 (9), 371–387.Google ScholarCross Ref
    Urtasun, R., Glardon, P., Boulic, R., Thalmann, D., and Fua, P. 2004. Style-Based motion synthesis. Comput. Graph. Forum 23, 4, 799–812.Google ScholarCross Ref
    Vasilescu, M. A. O. and Terzopoulos, D. 2004. TensorTextures: Multilinear image-based rendering. ACM Trans. Graph. 23, 3, 336–342. Google ScholarDigital Library
    Vlasic, D., Brand, M., Pfister, H., and Popović, J. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3, 426–433. Google ScholarDigital Library
    Wand, M., Jenke, P., Huang, Q., Bokeloh, M., Guibas, L., and Schilling, A. 2007. Reconstruction of deforming geometry from time-varying point clouds. In Proceedings of the 5th Eurographics Symposium on Geometry Processing. 49–58. Google ScholarDigital Library
    Wang, H., Wu, Q., Shi, L., Yu, Y., and Ahuja, N. 2005. Out-of-core tensor approximation of multi-dimensional matrices of visual data. ACM Trans. Graph. 24, 3, 527–535. Google ScholarDigital Library
    Wang, J., Fleet, D., and Aaron, H. 2008. Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30, 283–298. Google ScholarDigital Library
    White, R., Crane, K., and Forsyth, D. 2007. Capturing and animating occluded cloth. ACM Trans. Graph. 26. Google ScholarDigital Library
    Witkin, A. and Kass, M. 1988. Spacetime constraints. In Proceedings of SIGGRAPH. 159–168. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: