“Reconstruction of Personalized 3D Face Rigs From Monocular Video” by Wang, Shi, Xia and Chai

  • ©Congyi Wang, Fuhao Shi, Shihong Xia, and Jinxiang Chai



Session Title:



    Reconstruction of Personalized 3D Face Rigs From Monocular Video




    We present a novel approach for the automatic creation of a personalized high-quality 3D face rig of an actor from just monocular video data (e.g., vintage movies). Our rig is based on three distinct layers that allow us to model the actor’s facial shape as well as capture his person-specific expression characteristics at high fidelity, ranging from coarse-scale geometry to fine-scale static and transient detail on the scale of folds and wrinkles. At the heart of our approach is a parametric shape prior that encodes the plausible subspace of facial identity and expression variations. Based on this prior, a coarse-scale reconstruction is obtained by means of a novel variational fitting approach. We represent person-specific idiosyncrasies, which cannot be represented in the restricted shape and expression space, by learning a set of medium-scale corrective shapes. Fine-scale skin detail, such as wrinkles, are captured from video via shading-based refinement, and a generative detail formation model is learned. Both the medium- and fine-scale detail layers are coupled with the parametric prior by means of a novel sparse linear regression formulation. Once reconstructed, all layers of the face rig can be conveniently controlled by a low number of blendshape expression parameters, as widely used by animation artists. We show captured face rigs and their motions for several actors filmed in different monocular video formats, including legacy footage from YouTube, and demonstrate how they can be used for 3D animation and 2D video editing. Finally, we evaluate our approach qualitatively and quantitatively and compare to related state-of-the-art methods.


    1. Marc Alexa. 2002. Linear combination of transformations. ACM TOG 21, 3 (2002), 380–387. 
    2. Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The digital emily project: Photoreal facial modeling and animation. In ACM SIGGRAPH 2009 Courses. Article 12, 15 pages. 
    3. Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, Article 40, 9 pages. 
    4. Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, Article 75, 10 pages. 
    5. Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM TOG 33, 6, Article 223, 12 pages. 
    6. Amit H. Bermano, Derek Bradley, Thabo Beeler, Fabio Zund, Derek Nowrouzezahrai, Ilya Baran, Olga Sorkine-Hornung, Hanspeter Pfister, Robert W. Sumner, Bernd Bickel, and Markus Gross. 2014. Facial performance enhancement using dynamic shape space analysis. ACM TOG 33, 2, Article 13, 12 pages. 
    7. V. Blanz, C. Basso, T. Poggio, and T. Vetter. 2003. Reanimating faces in images and video. CGF 22, 3, 641–650.
    8. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH’99. 187–194. 
    9. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, Article 40, 10 pages. 
    10. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, Article 46, 9 pages. 
    11. Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, Article 43, 10 pages. 
    12. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D facial expression database for visual computing. IEEE TVCG 20, 3, 413–425. 
    13. Philip David, Daniel DeMenthon, Ramani Duraiswami, and Hanan Samet. 2004. SoftPOSIT: Simultaneous pose and correspondence determination. IJCV 59, 3, 259–284. 
    14. Beat Fasel and Juergen Luettin. 2003. Automatic facial expression analysis: A survey. Pattern Recognition 36, 1, 259–275.
    15. Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, Article 8, 14 pages. 
    16. Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, Article 158, 10 pages. 
    17. Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2 (2015), 193–204. 
    18. Paul Graham, Borom Tunwattanapong, Jay Busch, Xueming Yu, Andrew Jones, Paul E. Debevec, and Abhijeet Ghosh. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2 (2013), 335–344.
    19. Nicholas J. Higham. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160–1174. 
    20. Arthur E. Hoerl and Robert W. Kennard. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80–86. 
    21. Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR.
    22. Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, Article 74, 10 pages. 
    23. Hao-Da Huang, KangKang Yin, Ling Zhao, Yue Qi, Yizhou Yu, and Xin Tong. 2012. Detail-preserving controllable deformation from sparse examples. IEEE TVCG 18, 8, 1215–1227. 
    24. Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, Article 45 (2015), 14 pages. 
    25. Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frédéric Pighin. 2003. Learning controls for blend shape based realistic facial animation. In Proc. SCA’03. 187–192. 
    26. Martin Klaudiny and Adrian Hilton. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT. 17–24. 
    27. Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarter. Appl. Math. 2, 164–168.
    28. Bruno Lévy and Hao (Richard) Zhang. 2010. Spectral mesh processing. In ACM SIGGRAPH 2010 Courses. Article 8, 312 pages. 
    29. J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and theory of blendshape facial models. In Eurographics STARs. 199–218.
    30. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, Article 42, 10 pages. 
    31. Jun Li, Weiwei Xu, Zhi-Quan Cheng, Kai Xu, and Reinhard Klein. 2015. Lightweight wrinkle synthesis for 3D facial modeling and animation. Comput.-Aided Des. 58, 117–122.
    32. Wan-Chun Ma, Andrew Jones, Jen-Yuan Chiang, Tim Hawkins, Sune Frederiksen, Pieter Peers, Marko Vukovic, Ming Ouhyoung, and Paul Debevec. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM TOG 27, 5, Article 121, 10 pages. 
    33. Donald W. Marquardt. 1963. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 2 (1963), 431–441.
    34. Jorge J. Moré. 1978. The Levenberg-Marquardt algorithm: Implementation and theory. In Numerical Analysis. Lecture Notes in Math., Vol. 630. 105–116.
    35. Claus Müller. 1966. Spherical Harmonics. Lecture Notes in Math., Vol. 17.
    36. Diego Nehab, Szymon Rusinkiewicz, James Davis, and Ravi Ramamoorthi. 2005. Efficiently combining positions and normals for precise 3D geometry. ACM TOG 24, 3, 536–543. 
    37. Thomas Neumann, Kiran Varanasi, Stephan Wenger, Markus Wacker, Marcus Magnor, and Christian Theobalt. 2013. Sparse localized deformation components. ACM TOG 32, 6, Article 179, 10 pages. 
    38. Tiberiu Popa, I. South-Dickinson, Derek Bradley, Alla Sheffer, and Wolfgang Heidrich. 2010. Globally consistent space-time reconstruction. CGF 29, 5, 1633–1642.
    39. Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In Proc. SIGGRAPH ’01. 117–128. 
    40. Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Real-time avatar animation from a single image. In Proc. FG. 213–220.
    41. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, Article 222, 13 pages. 
    42. Eftychios Sifakis, Igor Neverov, and Ronald Fedkiw. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3 (2005), 417–425. 
    43. Robert W. Sumner and Jovan Popovic. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3 (2004), 399–405. 
    44. Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total moving face reconstruction. In Proc. ECCV. 796–812.
    45. J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive region-based linear 3D face models. ACM TOG 30, 4, Article 76, 10 pages. 
    46. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, Article 183, 14 pages. 
    47. Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, Article 187, 11 pages. 
    48. Bruno Vallet and Bruno Lévy. 2008. Spectral geometry processing with manifold harmonics. CGF 27, 2, 251–260.
    49. Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426–433. 
    50. Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 2, Article 15, 15 pages. 
    51. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM TOG 30, 4, Article 77, 10 pages. 
    52. Andreas Wenger, Andrew Gardner, Chris Tchou, Jonas Unger, Tim Hawkins, and Paul Debevec. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756–764. 
    53. Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6, Article 200, 10 pages. 
    54. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4, Article 156, 12 pages. 
    55. Michael Zollhöfer, Justus Thies, Matteo Colaianni, Marc Stamminger, and Günther Greiner. 2014. Interactive model-based reconstruction of the human head using an RGB-D sensor. J.Vis. Comput. Anim. 25, 3–4 (2014), 213–222. 

ACM Digital Library Publication: