“Spacetime faces: high resolution capture for modeling and animation” by Zhang, Snavely, Curless and Seitz

  • ©Li Zhang, Noah Snavely, Brian Curless, and Steven Seitz




    Spacetime faces: high resolution capture for modeling and animation



    We present an end-to-end system that goes from video sequences to high resolution, editable, dynamically controllable face models. The capture system employs synchronized video cameras and structured light projectors to record videos of a moving face from multiple viewpoints. A novel spacetime stereo algorithm is introduced to compute depth maps accurately and overcome over-fitting deficiencies in prior work. A new template fitting and tracking procedure fills in missing data and yields point correspondence across the entire sequence without using markers. We demonstrate a data-driven, interactive method for inverse kinematics that draws on the large set of fitted templates and allows for posing new expressions by dragging surface points directly. Finally, we describe new tools that model the dynamics in the input sequence to enable new animations, created via key-framing or texture-synthesis techniques.


    1. ALLEN, B., CURLESS, B., AND POPOVIC, Z. 2003. The space of human body shapes: reconstruction and parameterization from range scans. In SIGGRAPH Conference Proceedings, 587–594. Google ScholarDigital Library
    2. ARIKAN, O., AND FORSYTH, D. A. 2002. Synthesizing constrained motions from examples. In SIGGRAPH Conference Proceedings, 483–490.Google Scholar
    3. BAKER, S., GROSS, R., AND MATTHEWS, I. 2003. Lucas-kanade 20 years on: A unifying framework: Part 3. Tech. Rep. CMU-RI-TR-03-35, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, November.Google Scholar
    4. BASU, S., OLIVER, N., AND PENTLAND, A. 1998. 3D lip shapes from video: A combined physical-statistical model. Speech Communication 26, 1, 131–148. Google ScholarDigital Library
    5. BLACK, M. J., AND ANANDAN, P. 1993. Robust dense optical flow. In Proc. Int. Conf. on Computer Vision, 231–236.Google Scholar
    6. BLANZ, V., AND VETTER, T. 1999. A morphable model for the synthesis of 3D faces. IN SIGGRAPH Conference Proceedings, 187–194. Google ScholarDigital Library
    7. BLANZ, V., BASSO, C., POGGIO, T., AND VETTER, T. 2003. Reanimating faces in images and video. In Proceedings of EUROGRAPHICS, vol. 22, 641–650.Google ScholarCross Ref
    8. BOUGUET, J.-Y. 2001. Camera Calibration Toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib_doc/index.html.Google Scholar
    9. BRAND, M. 1999. Voice puppetry. In SIGGRAPH Conference Proceedings, 21–28. Google ScholarDigital Library
    10. BRAND, M. 2001. Morphable 3D models from video. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 456–463.Google ScholarCross Ref
    11. BREGLER, C., COVELL, M., AND SLANEY, M. 1997. Video rewrite: Visual speech synthesis from video. In SIGGRAPH Conference Proceedings, 353–360. Google ScholarDigital Library
    12. BROOMHEAD, D. S., AND LOWE, D. 1988. Multivariable functional interpolation and adptive networks. Complex Systems 2, 321–355.Google Scholar
    13. CHAI, J., JIN, X., AND HODGINS, J. 2003. Vision-based control of 3D facial animation. In Proceedings of Eurographics/SIGGRAPH Symposium on Computer Animation, 193–206. Google ScholarDigital Library
    14. COOTES, T. F., TAYLOR, C. J., COOPER, D. H., AND GRAHAM, J. 1995. Active shape models—their training and application. Computer Vision and Image Understanding 61, 1, 38–59. Google ScholarDigital Library
    15. CURLESS, B., AND LEVOY, M. 1996. A volumetric method for building complex models from range images. In SIGGRAPH Conference Proceedings, 303–312. Google ScholarDigital Library
    16. DAVIS, J., RAMAMOORTHI, R., AND RUSINKIEWICZ, S. 2003. Spacetime stereo: A unifying framework for depth from triangulation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 359–366.Google ScholarCross Ref
    17. DECARLO, D., AND METAXAS, D. 2002. Adjusting shape parameters using model-based optical flow residuals. IEEE Trans. on Pattern Analysis and Machine Intelligence 24, 6, 814–823. Google ScholarDigital Library
    18. ESSA, I., BASU, S., DARRELL, T., AND PENTLAND, A. 1996. Modeling, tracking and interactive animation of faces and heads using input from video. In Proceedings of the Computer Animation, IEEE Computer Society, 68–79. Google ScholarDigital Library
    19. EZZAT, T., GEIGER, G., AND POGGIO, T. 2002. Trainable videorealistic speech animation. In SIGGRAPH Conference Proceedings, 388–398. Google ScholarDigital Library
    20. FAUGERAS, O. 1993. Three-Dimensional Computer Vision. MIT Press. Google ScholarDigital Library
    21. GUENTER, B., GRIMM, C., WOOD, D., MALVAR, H., AND PIGHIN, F. 1998. Making faces. In SIGGRAPH Conference Proceedings, 55–66. Google ScholarDigital Library
    22. HUANG, P. S., ZHANG, C. P., AND CHIANG, F. P. 2003. High speed 3-d shape measurement based on digital fringe projection. Optical Engineering 42, 1, 163–168.Google ScholarCross Ref
    23. JOSHI, P., TIEN, W. C., DESBRUN, M., AND PIGHIN, F. 2003. Learning controls for blend shape based realistic facial animation. In Proceedings of Eurographics/SIGGRAPH Symposium on Computer Animation, 187–192. Google ScholarDigital Library
    24. KANADE, T., AND OKUTOMI, M. 1994. A stereo matching algorithm with an adaptive window: Theory and experiment. IEEE Trans. on Pattern Analysis and Machine Intelligence 16, 9, 920–932. Google ScholarDigital Library
    25. KOVAR, L., GLEICHER, M., AND PIGHIN, F. 2002. Motion graphs. In SIGGRAPH Conference Proceedings, 473–482. Google ScholarDigital Library
    26. KOZEN, D. C. 1992. The Design and Analysis of Algorithms. Springer. Google ScholarDigital Library
    27. LEE, J., CHAI, J., REITSMA, P. S. S., HODGINS, J. K., AND POLLARD, N. S. 2002. Interactive control of avatars animated with human motion data. In SIGGRAPH Conference Proceedings, 491–500. Google ScholarDigital Library
    28. LI, Y., WANG, T., AND SHUM, H.-Y. 2002. Motion texture: A two-level statistical model for character motion synthesis. In SIGGRAPH Conference Proceedings, 465–472. Google ScholarDigital Library
    29. NAYAR, S. K., WATANABE, M., AND NOGUCHI, M. 1996. Real-time focus range sensor. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 12, 1186–1198. Google ScholarDigital Library
    30. NOCEDAL, J., AND WRIGHT, S. J. 1999. Numerical Optimization. Springer.Google Scholar
    31. PARKE, F. I. 1972. Computer generated animation of faces. In Proceedings of the ACM annual conference, ACM Press, 451–457. Google ScholarDigital Library
    32. PIGHIN, F., HECKER, J., LISCHINSKI, D., SALESIN, D. H., AND SZELISKI, R. 1998. Synthesizing realistic facial expressions from photographs. In SIGGRAPH Conference Proceedings, 75–84. Google ScholarDigital Library
    33. PIGHIN, F., SALESIN, D. H., AND SZELISKI, R. 1999. Resynthesizing facial animation through 3D model-based tracking. In Proc. Int. Conf. on Computer Vision, 143–150.Google ScholarCross Ref
    34. PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., AND VETTERLING, W. T. 1993. Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge University Press. Google ScholarDigital Library
    35. PROESMANS, M., GOOL, L. V., AND OOSTERLINCK, A. 1996. One-shot active 3D shape acquization. In Proc. Int. Conf. on Pattern Recognition, 336–340. Google ScholarDigital Library
    36. PULLI, K., AND GINZTON, M. 2002. Scanalyze. http://graphics.stanford.edu/software/scanalyze/.Google Scholar
    37. RASKAR, R., WELCH, G., CUTTS, M., LAKE, A., STESIN, L., AND FUCHS, H. 1998. The office of the future: A unified approach to image-based modeling and spatially immersive displays. In SIGGRAPH Conference Proceedings, 179–188. Google ScholarDigital Library
    38. SCHARSTEIN, D., AND SZELISKI, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. on Computer Vision 47, 1, 7–42. Google ScholarDigital Library
    39. SCHÖDL, A., AND ESSA, I. A. 2002. Controlled animation of video sprites. In Proceedings of Eurographics/SIGGRAPH Symposium on Computer Animation, ACM Press, 121–127. Google ScholarDigital Library
    40. SCHÖDL, A., SZELISKI, S., SALESIN, D. H., AND ESSA, I. 2000. Video textures. In SIGGRAPH Conference Proceedings, 489–498. Google ScholarDigital Library
    41. TORRESANI, L., YANG, D. B., ALEXANDER, E. J., AND BREGLER, C. 2001. Tracking and modeling non-rigid objects with rank constraints. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 493–500.Google ScholarCross Ref
    42. VEDULA, S., BAKER, S., RANDER, P., COLLINS, R., AND KANADE, T. 1999. Three-dimensional scene flow. In Proc. Int. Conf. on Computer Vision, 722–729. Google ScholarDigital Library
    43. ZHANG, L., CURLESS, B., AND SEITZ, S. M. 2003. Spacetime stereo: Shape recovery for dynamic scenes. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 367–374.Google ScholarCross Ref
    44. ZHANG, Q., LIU, Z., GUO, B., AND SHUM, H. 2003. Geometry-driven photo-realistic facial expression synthesis. In Proceedings of Eurographics/SIGGRAPH Symposium on Computer Animation, 177–186. Google ScholarDigital Library

ACM Digital Library Publication: