“FaceVR: Real-Time Gaze-Aware Facial Reenactment in Virtual Reality”

  • ©Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner




    FaceVR: Real-Time Gaze-Aware Facial Reenactment in Virtual Reality



    We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. State-of-the-art face tracking methods in the VR context are focused on the animation of rigged 3D avatars (Li et al. 2015; Olszewski et al. 2016). Although they achieve good tracking performance, the results look cartoonish and not real. In contrast to these model-based approaches, FaceVR enables VR teleconferencing using an image-based technique that results in nearly photo-realistic outputs. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. Based on reenactment of a prerecorded stereo video of the person without the HMD, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions or change gaze directions in the prerecorded target video. In a live setup, we apply these newly introduced algorithmic components.


    1. Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The digital emily project: Photoreal facial modeling and animation. In Proceedings of ACM SIGGRAPH 2009 Courses. Article 12, 15 pages. 
    2. Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graph. 30, 4, 75:1–75:10. 
    3. Amit Bermano, Thabo Beeler, Yeara Kozlov, Derek Bradley, Bernd Bickel, and Markus Gross. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM Trans. Graph. 34, 4, Article 44, 11 pages. 
    4. Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating faces in images and video. In Proceedings of EUROGRAPHICS, Vol. 22. 641–650.
    5. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of SIGGRAPH. 187–194. 
    6. George Borshukov, Dan Piponi, Oystein Larsen, J. P. Lewis, and Christina Tempelaar-Lietz. 2003. Universal capture: Image-based facial animation for “The Matrix Reloaded.” In Proceedings of SIGGRAPH Sketches. 16:1. 
    7. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online modeling for realtime facial animation. ACM Trans. Graph. 32, 4, Article 40, 10 pages. 
    8. Christoph Bregler, Michele Covell, and Malcolm Slaney. 1997. Video rewrite: Driving visual speech with audio. In Proceedings of SIGGRAPH. 353–360. 
    9. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4, Article 46, 9 pages. 
    10. Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. 33, 4, Article 43, 10 pages. 
    11. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20, 3, 413–425. 
    12. Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4, Article 16. 
    13. Antonio Criminisi, Jamie Shotton, Andrew Blake, and Philip H. S. Torr. 2003. Gaze manipulation for one-to-one teleconferencing. In Proceedings of ICCV. 
    14. Kevin Dale, Kalyan Sunkavalli, Micah K. Johnson, Daniel Vlasic, Wojciech Matusik, and Hanspeter Pfister. 2011. Video face replacement. ACM Trans. Graph. 30, 6, 130:1–130:10. 
    15. Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2016. Opt: A domain specific language for non-linear least squares optimization in graphics and imaging. arXiv:1604.06525.
    16. Chris H. Q. Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. 2006. R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization. In Proceedings of ICML. ACM, New York, NY, 281–288. 
    17. Christian Frueh, Avneesh Sud, and Vivek Kwatra. 2017. Headset removal for virtual and mixed reality. In Proceedings of ACM SIGGRAPH Talks 2017. 
    18. Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM Trans. Graph. 34, 1, Article 8, 14 pages. 
    19. Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of CVPR. 
    20. Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. VDub—modifying face video of actors for plausible visual alignment to a dubbed audio track. In Proceedings of EUROGRAPHICS.
    21. Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32, 158:1–158:10. 
    22. Pablo Garrido, Michael Zollhöfer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Pérez, and Christian Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35, 3, 28. 
    23. Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of CVPR.
    24. Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. 30, 4, Article 74, 10 pages. 
    25. Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM Trans. Graph. 34, 4, Article 45, 14 pages.
    26. Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-driven speech animation synthesis focusing on realistic inside of the mouth. J. Inf. Process. 22, 2, 401–409.
    27. Ira Kemelmacher-Shlizerman. 2016. Transfiguring portraits. ACM Trans. Graph. 35, 4, Article 94, 8 pages. 
    28. Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steven M. Seitz. 2010. Being John Malkovich. In Proceedings of ECCV. 341–353. 
    29. Oliver Klehm, Fabrice Rousselle, Marios Papas, Derek Bradley, Christophe Hery, Bernd Bickel, Wojciech Jarosz, and Thabo Beeler. 2015. Recent advances in facial appearance capture. In Proceedings of EUROGRAPHICS STAR Reports.
    30. D. Kononenko and V. Lempitsky. 2015. Learning to look up: Realtime monocular gaze correction using machine learning. In Proceedings of CVPR. 4667–4675.
    31. Claudia Kuster, Tiberiu Popa, Jean-Charles Bazin, Craig Gotsman, and Markus Gross. 2012. Gaze correction for home video conferencing. ACM Trans. Graph. 31, 6, 164:1–174:6. 
    32. Pupil Labs. 2016. Home Page. Retrieved April 4, 2018, from https://pupil-labs.com/pupil/.
    33. J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and theory of blendshape facial models. In Proceedings of EUROGRAPHICS STAR Reports. 199–218.
    34. Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial performance sensing head-mounted display. ACM Trans. Graph. 34, 4, Article 47. 
    35. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4, Article 42. 
    36. Kai Li, Feng Xu, Jue Wang, Qionghai Dai, and Yebin Liu. 2012. A data-driven approach for facial expression synthesis in video. In Proceedings of CVPR. 57–64. 
    37. Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity facial and speech animation for VR HMDs. ACM Trans. Graph. 35, 6, Article 221. 
    38. Mustafa Ozuysal, Michael Calonder, Vincent Lepetit, and Pascal Fua. 2010. Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32, 3, 448–461. 
    39. Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. In Proceedings of ACM SIGGRAPH. ACM, New York, NY, 313–318. 
    40. F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. Salesin. 1998. Synthesizing realistic facial expressions from photographs. In Proceedings of SIGGRAPH. 75–84. 
    41. F. Pighin and J. P. Lewis. 2006. Performance-driven facial animation. In Proceedings of ACM SIGGRAPH Courses.
    42. Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In Proceedings of SIGGRAPH. ACM, New York, NY, 117–128. 
    43. Shunsuke Saito, Tianye Li, and Hao Li. 2016. Real-time facial segmentation and performance capture from RGB input. In Proceedings of ECCV.
    44. Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vis. 91, 2, 200–215. 
    45. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans. Graph. 33, 6, Article 222. 
    46. Christian Siegl, Vanessa Lange, Marc Stamminger, Frank Bauer, and Justus Thies. 2017. FaceForge: Markerless non-rigid face multi-projection mapping. IEEE Trans. Vis. Comput. Graph.23, 11, 2440–2446.
    47. Y. Sugano, Y. Matsushita, and Y. Sato. 2014. Learning-by-synthesis for appearance-based 3D gaze estimation. In Proceedings of CVPR. 1821–1828. 
    48. Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total moving face reconstruction. In Proceedings of ECCV. 796–812.
    49. Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2015. What makes Tom Hanks look like Tom Hanks. In Proceedings of ICCV. 
    50. Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing Obama: Learning lip sync from audio. ACM Trans. Graph. 36, 4, Article 95, 13 pages. 
    51. Sarah L. Taylor, Barry-John Theobald, and Iain A. Matthews. 2015. A mouth full of words: Visually consistent acoustic redubbing. In Proceedings of ICASSP. IEEE, Los Alamitos, CA, 4904–4908.
    52. J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive region-based linear 3D face models. ACM Trans. Graph. 30, 4, Article 76, 10 pages. 
    53. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6, Article 183, 14 pages. 
    54. Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of CVPR.
    55. Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans. Graph. 31, 6, Article 187. 
    56. Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3, 426–433. 
    57. Congyi Wang, Fuhao Shi, Shihong Xia, and Jinxiang Chai. 2016. Realtime 3D eye gaze animation using a single RGB camera. ACM Trans. Graph. 35, 4, Article 118. 
    58. Yu-Shuen Wang, Chiew-Lan Tai, Olga Sorkine, and Tong-Yee Lee. 2008. Optimized scale-and-stretch for image resizing. ACM Trans. Graph. 27, 5, Article 118. 
    59. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4, Article 77. 
    60. Thibaut Weise, Hao Li, Luc J. Van Gool, and Mark Pauly. 2009. Face/Off: Live facial puppetry. In Proceedings of SCA. 7–16. 
    61. Lance Williams. 1990. Performance-driven facial animation. In Proceedings of SIGGRAPH. 235–242. 
    62. Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. 33, 6, Article 200. 
    63. Li Zhang, Noah Snavely, Brian Curless, and Steven M. Seitz. 2004. Spacetime faces: High resolution capture for modeling and animation. ACM Trans. Graph. 23, 3, 548–558. 
    64. Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In Proceedings of CVPR.
    65. Michael Zollhöfer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Trans. Graph. 34, 4. Article 96. 
    66. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4, Article 156. 

ACM Digital Library Publication: