Reconstructing detailed dynamic face geometry from monocular video

Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single principal camera. For post-production, these require dynamic monocular face capture for appearance modification. We present a new method for capturing face geometry from monocular video. Our approach captures detailed, dynamic, spatio-temporally coherent 3D face geometry without the need for markers. It works under uncontrolled lighting, and it successfully reconstructs expressive motion including high-frequency face detail such as folds and laugh lines. After simple manual initialization, the capturing process is fully automatic, which makes it versatile, lightweight and easy-to-deploy. Our approach tracks accurate sparse 2D features between automatically selected key frames to animate a parametric blend shape model, which is further refined in pose, expression and shape by temporally coherent optical flow and photometric stereo. We demonstrate performance capture results for long and complex face sequences captured indoors and outdoors, and we exemplify the relevance of our approach as an enabling technology for model-based face editing in movies and video, such as adding new facial textures, as well as a step towards enabling everyone to do facial performance capture with a single affordable camera.

References:

1. Ahonen, T., Hadid, A., and Pietikainen, M. 2006. Face description with local binary patterns: Application to face recognition. IEEE TPAMI 28, 12, 2037–2041.
2. Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. The Digital Emily Project: photoreal facial modeling and animation. In ACM SIGGRAPH Courses, 12:1–12:15.
3. Arun, K. S., Huang, T. S., and Blostein, S. D. 1987. Least-squares fitting of two 3-D point sets. IEEE TPAMI 9, 5, 698–700.
4. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG (Proc. SIGGRAPH) 30, 75:1–75:10.
5. Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM TOG (Proc. SIGGRAPH) 26, 33:1–33:10.
6. Black, M., and Yacoob, Y. 1995. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In Proc. ICCV, 374–381.
7. Blanz, V., Basso, C., Vetter, T., and Poggio, T. 2003. Reanimating faces in images and video. CGF (Proc. EUROGRAPHICS) 22, 641–650.
8. Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: image-based facial animation for “The Matrix Reloaded”. In ACM SIGGRAPH 2003 Sketches, 16:1–16:1.
9. Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 40:1–40:10.
10. Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG (Proc. SIGGRAPH) 29, 4, 41:1–41:10.
11. Brand, M., and Bhotika, R. 2001. Flexible flow for 3D nonrigid tracking and shape recovery. In Proc. CVPR, 315–322.
12. Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3D shape regression for real-time facial animation. ACM TOG (Proc. SIGGRAPH) 32, 4, 41:1–41:10.
13. Chai, J.-x., Xiao, J., and Hodgins, J. 2003. Vision-based control of 3D facial animation. In Proc. SCA, 193–206.
14. Chuang, E., and Bregler, C. 2002. Performance-driven facial animation using blend shape interpolation. Tech. Rep. CS-TR-2002-02, Stanford University.
15. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE TPAMI 23, 6, 681–685.
16. Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG (Proc. SIGGRAPH Asia) 30, 6, 130:1–130:10.
17. Dantone, M., Gall, J., Fanelli, G., and Gool, L. V. 2012. Real-time facial feature detection using conditional regression forests. In Proc. CVPR, 2578–2585.
18. David, P., DeMenthon, D., Duraiswami, R., and Samet, H. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259–284.
19. DeCarlo, D., and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. In Proc. CVPR, 231–238.
20. Essa, I., Basu, S., Darrell, T., and Pentland, A. 1996. Modeling, tracking and interactive animation of faces and heads using input from video. In Proc. CA, 68–79.
21. Furukawa, Y., and Ponce, J. 2009. Dense 3D motion capture for human faces. In Proc. CVPR, 1674–1681.
22. Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. SIGGRAPH, 55–66.
23. Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG (Proc. SIGGRAPH) 30, 74:1–74:10.
24. Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341–353.
25. Li, H., Roivainen, P., and Forcheimer, R. 1993. 3-D motion estimation in model-based facial image coding. IEEE TPAMI 15, 6, 545–555.
26. Li, H., Weise, T., and Pauly, M. 2010. Example-based facial rigging. ACM TOG (Proc. SIGGRAPH) 29, 3, 32:1–32:6.
27. Li, K., Xu, F., Wang, J., Dai, Q., and Liu, Y. 2012. A data-driven approach for facial expression synthesis in video. In Proc. CVPR, 57–64.
28. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG (Proc. SIGGRAPH) 32, 4, 42:1–42:10.
29. Nehab, D., Rusinkiewicz, S., Davis, J., and Ramamoorthi, R. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536–543.
30. Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM SIGGRAPH Courses.
31. Pighin, F., Szeliski, R., and Salesin, D. 1999. Resynthesizing facial animation through 3D model-based tracking. In Proc. CVPR, 143–150.
32. Platt, J. C. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Tech. Rep. MSRTR-98-14, Microsoft Research.
33. Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. IJCV 91, 2, 200–215.
34. Sorkine, O. 2005. Laplacian mesh processing. In EUROGRAPHICS STAR report, 53–70.
35. Valgaerts, L., Bruhn, A., Mainberger, M., and Weickert, J. 2011. Dense versus sparse approaches for estimating the fundamental matrix. IJCV 96, 2, 212–234.
36. Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG (Proc. SIGGRAPH Asia) 31, 6, 187:1–187:11.
37. Vlasic, D., Brand, M., Pfister, H., and Popovíc, J. 2005. Face transfer with multilinear models. ACM TOG (Proc. SIGGRAPH) 24, 3, 426–433.
38. Volz, S., Bruhn, A., Valgaerts, L., and Zimmer, H. 2011. Modeling temporal coherence for optical flow. In Proc. ICCV, 1116–1123.
39. Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3-D facial expressions. CGF 23, 677–686.
40. Weise, T., Leibe, B., and Gool, L. J. V. 2007. Fast 3D scanning with automatic motion compensation. In Proc. CVPR.
41. Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: live facial puppetry. In Proc. SIGGRAPH/Eurographics Symposium on Computer Animation, 7–16.
42. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG (Proc. SIGGRAPH) 30, 77:1–77:10.
43. Williams, L. 1990. Performance-driven facial animation. In Proc. SIGGRAPH, 235–242.
44. Wilson, C. A., Ghosh, A., Peers, P., Chiang, J.-Y., Busch, J., and Debevec, P. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM TOG 29, 17:1–17:11.
45. Xiao, J., Baker, S., Matthews, I., and Kanade, T. 2004. Real-time combined 2D+3D active appearance models. In Proc. CVPR, 535–542.
46. Zhang, L., Noah, Curless, B., and Seitz, S. M. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM TOG (Proc. SIGGRAPH) 23, 548–558.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2013: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Reconstructing detailed dynamic face geometry from monocular video” by Garrido, Valgaerts, Wu and Theobalt

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: