“Displaced dynamic expression regression for real-time facial tracking and animation” by Cao, Hou and Zhou

  • ©Chen Cao, Qiming Hou, and Kun Zhou




    Displaced dynamic expression regression for real-time facial tracking and animation

Session/Category Title: Faces




    We present a fully automatic approach to real-time facial tracking and animation with a single video camera. Our approach does not need any calibration for each individual user. It learns a generic regressor from public image datasets, which can be applied to any user and arbitrary video cameras to infer accurate 2D facial landmarks as well as the 3D facial shape from 2D video frames. The inferred 2D landmarks are then used to adapt the camera matrix and the user identity to better match the facial expressions of the current user. The regression and adaptation are performed in an alternating manner. With more and more facial expressions observed in the video, the whole process converges quickly with accurate facial tracking and animation. In experiments, our approach demonstrates a level of robustness and accuracy on par with state-of-the-art techniques that require a time-consuming calibration step for each individual user, while running at 28 fps on average. We consider our approach to be an attractive solution for wide deployment in consumer-level applications.


    1. Asthana, A., Zafeiriou, S., Cheng, S., and Pantic, M. 2013. Robust discriminative response map fitting with constrained local models. In IEEE CVPR, 3444–3451. Google ScholarDigital Library
    2. Baltrušaitis, T., Robinson, P., and Morency, L.-P. 2012. 3D constrained local model for rigid and non-rigid facial tracking. In Proceedings of IEEE CVPR, 2610–2617. Google ScholarDigital Library
    3. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graph. 30, 4, 75:1–75:10. Google ScholarDigital Library
    4. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3d faces. In Proceedings of SIGGRAPH, 187–194. Google ScholarDigital Library
    5. Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM Trans. Graph. 32, 4 (July), 40:1–40:10. Google ScholarDigital Library
    6. Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM Trans. Graph. 29, 4, 41:1–41:10. Google ScholarDigital Library
    7. Burgos-Artizzu, X. P., Perona, P., and Dollár, P. 2013. Robust face landmark estimation under occlusion. In Proceedings of ICCV, 117–124. Google ScholarDigital Library
    8. Byrd, R. H., Lu, P., Nocedal, J., and Zhu, C. 1995. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 5 (Sept.), 1190–1208. Google ScholarDigital Library
    9. Cao, X., Wei, Y., Wen, F., and Sun, J. 2012. Face alignment by explicit shape regression. Proceedings of IEEE CVPR, 2887–2894. Google ScholarDigital Library
    10. Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3d shape regression for real-time facial animation. ACM Trans. Graph. 32, 4 (July), 41:1–41:10. Google ScholarDigital Library
    11. Cao, C., Weng, Y., Zhou, S., Tong, Y., and Zhou, K. 2013. Facewarehouse: a 3D facial expression database for visual computing. IEEE TVCG, PrePrints. Google ScholarDigital Library
    12. Chai, J.-X., Xiao, J., and Hodgins, J. 2003. Vision-based control of 3d facial animation. In Symp. Comp. Anim., 193–206. Google ScholarDigital Library
    13. Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. 1995. Active shape models – their training and application. Computer Vision and Image Understanding 61, 38–59. Google ScholarDigital Library
    14. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 1998. Active appearance models. In Proceedings of ECCV, 484–498. Google ScholarDigital Library
    15. DeCarlo, D., and Metaxas, D. 2000. Optical flow constraints on deformable models with applications to face tracking. Int. Journal of Computer Vision 38, 2, 99–127. Google ScholarDigital Library
    16. Dollar, P., Welinder, P., and Perona, P. 2010. Cascaded pose regression. In Proceedings of IEEE CVPR, 1078–1085.Google Scholar
    17. Ekman, P., and Friesen, W. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press.Google Scholar
    18. Essa, I., Basu, S., Darrell, T., and Pentland, A. 1996. Modeling, tracking and interactive animation of faces and heads: using input from video. In Computer Animation, 68–79. Google ScholarDigital Library
    19. Garrido, P., Valgaert, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32, 6 (Nov.), 158:1–158:10. Google ScholarDigital Library
    20. Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller, E. 2007. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49, University of Massachusetts, Amherst, October.Google Scholar
    21. Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3d scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. 30, 4, 74:1–74:10. Google ScholarDigital Library
    22. Lewis, J. P., and Anjyo, K. 2010. Direct manipulation blendshapes. IEEE CG&A 30, 4, 42–50. Google ScholarDigital Library
    23. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4 (July), 42:1–42:10. Google ScholarDigital Library
    24. Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. H. 1998. Synthesizing realistic facial expressions from photographs. In Proceedings of SIGGRAPH, 75–84. Google ScholarDigital Library
    25. Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3d model-based tracking. In Int. Conf. Computer Vision, 143–150.Google Scholar
    26. Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Real-time avatar animation from a single image. In IEEE International Conference on Automatic Face Gesture Recognition and Workshops, 117–124.Google Scholar
    27. Saragih, J., Lucey, S., and Cohn, J. 2011. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91, 2, 200–215. Google ScholarDigital Library
    28. Tarres, F., and Rama, A. GTAV Face Database. A http://gps-tsc.upc.es/GTAV/ResearchAreas/UPCFaceDatabase/GTAVFaceDatabase.htm.Google Scholar
    29. Viola, P., and Jones, M. 2004. Robust real-time face detection. International Journal of Computer Vision 57, 2, 137–154. Google ScholarDigital Library
    30. Vlasic, D., Brand, M., Pfister, H., and Popović, J. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3, 426–433. Google ScholarDigital Library
    31. Weise, T., Li, H., Gool, L. V., and Pauly, M. 2009. Face/off: Live facial puppetry. In Symp. Computer Animation, 7–16. Google ScholarDigital Library
    32. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4 (July), 77:1–77:10. Google ScholarDigital Library
    33. Weng, Y., Cao, C., Hou, Q., and Zhou, K. 2013. Real-time facial animation on mobile devices. Graphical Models, PrePrints.Google Scholar
    34. Williams, L. 1990. Performance-driven facial animation. In Proceedings of SIGGRAPH, 235–242. Google ScholarDigital Library
    35. Xiao, J., Baker, S., Matthews, I., and Kanade, T. 2004. Real-time combined 2d+3d active appearance models. In Proceedings of IEEE CVPR, 535–542. Google ScholarDigital Library
    36. Xiong, X., and De La Torre, F. 2013. Supervised descent method and its applications to face alignment. In Proceedings of IEEE CVPR, 532–539. Google ScholarDigital Library
    37. Zhang, L., Snavely, N., Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM Trans. Graph. 23, 3, 548–558. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: