Headon: real-time reenactment of human portrait videos

We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel realtime reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.

References:

1. Oleg Alexander, Mike Rogers, William Lambeth, Matt Chiang, and Paul Debevec. 2009. The Digital Emily Project: Photoreal Facial Modeling and Animation. In ACM SIGGRAPH 2009 Courses. Article 12, 12:1–12:15 pages. Google ScholarDigital Library
2. Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, and Michael F. Cohen. 2017. Bringing Portraits to Life. ACM Transactions on Graphics (Proceeding of SIGGRAPH Asia 2017) 36, 4 (2017), to appear. Google ScholarDigital Library
3. Paul J. Besl and Neil D. McKay. 1992. A Method for Registration of 3-D Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14, 2 (Feb. 1992), 239–256. Google ScholarDigital Library
4. Volker Blanz, Curzio Basso, Tomaso Poggio, and Thomas Vetter. 2003. Reanimating faces in images and video. In Proc. EUROGRAPHICS, Vol. 22. 641–650.Google ScholarCross Ref
5. Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In ACM TOG. 187–194. Google ScholarDigital Library
6. J. Booth, A. Roussos, S. Zafeiriou, A. Ponniah, and D. Dunaway. 2016. A 3D Morphable Model learnt from 10,000 faces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
7. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM TOG 32, 4, Article 40 (2013), 10 pages. Google ScholarDigital Library
8. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time High-fidelity Facial Performance Capture. ACM TOG 34, 4, Article 46 (2015), 9 pages. Google ScholarDigital Library
9. Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. In ACM TOG, Vol. 33. 43:1–43:10. Google ScholarDigital Library
10. Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D shape regression for real-time facial animation. In ACM TOG, Vol. 32. 41:1–41:10. Google ScholarDigital Library
11. Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2014b. FaceWarehouse: A 3D Facial Expression Database for Visual Computing. IEEE TVCG 20, 3 (2014), 413–425. Google ScholarDigital Library
12. Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time Facial Animation with Image-based Dynamic Avatars. ACM Trans. Graph. 35, 4 (July 2016). Google ScholarDigital Library
13. Jin-xiang Chai, Jing Xiao, and Jessica Hodgins. 2003. Vision-based control of 3D facial animation. In Proc. SCA. 193–206. Google ScholarDigital Library
14. Yang Chen and Gérard G. Medioni. 1992. Object modelling by registration of multiple range images. Image and Vision Computing 10, 3 (1992), 145–155. Google ScholarDigital Library
15. Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, and Jinxiang Chai. 2013. Accurate and Robust 3D Facial Capture Using a Single RGBD Camera. Proc. ICCV (2013), 3615–3622. Google ScholarDigital Library
16. E. Chuang and C. Bregler. 2002. Performance-driven Facial Animation using Blend Shape Interpolation. Technical Report CS-TR-2002–02. Stanford University.Google Scholar
17. Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active Appearance Models. IEEE TPAMI 23, 6 (2001), 681–685. Google ScholarDigital Library
18. Brian Curless and Marc Levoy. 1996. A Volumetric Method for Building Complex Models from Range Images. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’96). ACM, New York, NY, USA, 303–312. Google ScholarDigital Library
19. Angela Dai, Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Christian Theobalt. 2017. BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Reintegration. ACM Transactions on Graphics (TOG) 36, 3 (2017), 24. Google ScholarDigital Library
20. Kevin Dale, Kalyan Sunkavalli, Micah K. Johnson, Daniel Vlasic, Wojciech Matusik, and Hanspeter Pfister. 2011. Video face replacement. In ACM TOG, Vol. 30.130:1–130:10. Google ScholarDigital Library
21. Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Trans. Graph. 34, 1, Article 8 (Dec. 2014), 14 pages. Google ScholarDigital Library
22. Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor S. Lempitsky. 2016. DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation. In ECCV.Google Scholar
23. Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, and Christian Theobalt. 2014. Automatic Face Reenactment. In Proc. CVPR. Google ScholarDigital Library
24. Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. VDub – Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track. In CGF (Proc. EUROGRAPHICS). Google ScholarDigital Library
25. Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing Detailed Dynamic Face Geometry from Monocular Video. In ACM TOG, Vol. 32. 158:1–158:10. Google ScholarDigital Library
26. Pablo Garrido, Michael Zollhöfer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Pérez, and Christian Theobalt. 2016. Reconstruction of Personalized 3D Face Rigs from Monocular Video. ACM Transactions on Graphics (TOG) 35, 3 (2016), 28. Google ScholarDigital Library
27. Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The Lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’96). ACM, New York, NY, USA, 43–54. Google ScholarDigital Library
28. Pei-Lun Hsieh, Chongyang Ma, Jihun Yu, and Hao Li. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR.Google ScholarCross Ref
29. Liwen Hu, Shunsuke Saito, Lingyu Wei, Koki Nagano, Jaewoo Seo, Jens Fursund, Iman Sadeghi, Carrie Sun, Yen-Chun Chen, and Hao Li. 2017. Avatar Digitization from a Single Image for Real-time Rendering. ACM Trans. Graph. 36, 6, Article 195 (Nov. 2017), 14 pages. Google ScholarDigital Library
30. Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D Avatar Creation from Hand-held Video Input. ACM TOG 34, 4, Article 45 (2015), 14 pages. Google ScholarDigital Library
31. Alexandru-Eugen Ichim, Petr Kadleček, Ladislav Kavan, and Mark Pauly. 2017. Phace: Physics-based Face Modeling and Animation. ACM Trans. Graph. 36, 4, Article 153 (July 2017), 14 pages. Google ScholarDigital Library
32. Aaron Isaksen, Leonard McMillan, and Steven J. Gortler. 2000. Dynamically Reparameterized Light Fields. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 297–306. Google ScholarDigital Library
33. Sing Bing Kang, Yin Li, Xin Tong, and Heung-Yeung Shum. 2006. Image-based Rendering. Found. Trends. Comput. Graph. Vis. 2, 3 (Jan. 2006), 173–258. Google ScholarDigital Library
34. Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-Driven Speech Animation Synthesis Focusing on Realistic Inside of the Mouth. Journal of Information Processing 22, 2 (2014), 401–409.Google ScholarCross Ref
35. Ira Kemelmacher-Shlizerman. 2016. Transfiguring Portraits. ACM Trans. Graph. 35, 4, Article 94 (July 2016), 8 pages. Google ScholarDigital Library
36. Oliver Klehm, Fabrice Rousselle, Marios Papas, Derek Bradley, Christophe Hery, Bernd Bickel, Wojciech Jarosz, and Thabo Beeler. 2015. Recent Advances in Facial Appearance Capture. CGF (EUROGRAPHICS STAR Reports) (2015). Google ScholarDigital Library
37. Johannes Kopf, Fabian Langguth, Daniel Scharstein, Richard Szeliski, and Michael Goesele. 2013. Image-based Rendering in the Gradient Domain. ACM Trans. Graph. 32, 6, Article 199 (Nov. 2013), 9 pages. Google ScholarDigital Library
38. J. P. Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fred Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Eurographics STARs. 199–218.Google Scholar
39. Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial Performance Sensing Head-Mounted Display. ACM Transactions on Graphics (Proceedings SIGGRAPH 2015) 34, 4 (July 2015). Google ScholarDigital Library
40. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime Facial Animation with On-the-fly Correctives. In ACM TOG, Vol. 32. Google ScholarDigital Library
41. Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics 36, 6 (Nov. 2017), 194:1–194:17. Two first authors contributed equally. Google ScholarDigital Library
42. Shu Liang, Ira Kemelmacher-Shlizerman, and Linda G. Shapiro. 2014. 3D Face Hallucination from a Single Depth Frame. In 3D Vision (3DV). IEEE Computer Society. Google ScholarDigital Library
43. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16. Google ScholarDigital Library
44. William E. Lorensen and Harvey E. Cline. 1987. Marching Cubes: A High Resolution 3D Surface Construction Algorithm. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’87). ACM, New York, NY, USA, 163–169. Google ScholarDigital Library
45. Kok-Lim Low. 2004. Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration. (01 2004).Google Scholar
46. Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose Guided Person Image Generation. In NIPS.Google Scholar
47. Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time Dense Surface Mapping and Tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR ’11). IEEE Computer Society, Washington, DC, USA, 127–136. Google ScholarDigital Library
48. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D Reconstruction at Scale Using Voxel Hashing. ACM Trans. Graph. 32, 6, Article 169, 11 pages. Google ScholarDigital Library
49. Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-Fidelity Facial and Speech Animation for VR HMDs. ACM TOG 35, 6 (2016). Google ScholarDigital Library
50. Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson Image Editing. In ACM SIGGRAPH 2003 Papers (SIGGRAPH ’03). ACM, 313–318. F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. Salesin. 1998. Synthesizing realistic facial expressions from photographs. In ACM TOG. 75–84. Google ScholarDigital Library
51. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive Foreground Extraction Using Iterated Graph Cuts. In ACM SIGGRAPH 2004 Papers (SIGGRAPH ’04). ACM, New York, NY, USA, 309–314. Google ScholarDigital Library
52. Szymon Rusinkiewicz and Marc Levoy. 2001. Efficient variants of the ICP algorithm. In 3-D Digital Imaging and Modeling, 2001. Proceedings. Third International Conference on. IEEE, 145–152.Google Scholar
53. Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable Model Fitting by Regularized Landmark Mean-Shift. IJCV 91, 2 (2011). Google ScholarDigital Library
54. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014a. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. In ACM TOG, Vol. 33. Issue 6. Google ScholarDigital Library
55. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014b. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. ACM Trans. Graph. 33, 6, Article 222 (Nov. 2014), 13 pages. Google ScholarDigital Library
56. Eftychios Sifakis, Igor Neverov, and Ronald Fedkiw. 2005. Automatic Determination of Facial Muscle Activations from Sparse Motion Capture Marker Data. ACM TOG 24, 3 (2005), 417–425. Google ScholarDigital Library
57. Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total Moving Face Reconstruction. In Proc. ECCV. 796–812.Google Scholar
58. Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2015. What Makes Tom Hanks Look Like Tom Hanks. In Proc. ICCV. Google ScholarDigital Library
59. Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing Obama: Learning Lip Sync from Audio. ACM Trans. Graph. 36, 4, Article 95 (July 2017), 13 pages. Google ScholarDigital Library
60. J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive Region-based Linear 3D Face Models. ACM TOG 30, 4, Article 76 (2011), 10 pages. Google ScholarDigital Library
61. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time Expression Transfer for Facial Reenactment. ACM TOG 34, 6, Article 183 (2015), 14 pages. Google ScholarDigital Library
62. Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. CVPR.Google Scholar
63. Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2017. Demo of FaceVR: Real-time Facial Reenactment and Eye Gaze Control in Virtual Reality. In ACM SIGGRAPH 2017 Emerging Technologies (SIGGRAPH ’17). ACM, Article 7, 2 pages. Google ScholarDigital Library
64. Justus Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2018. FaceVR: Real-Time Gaze-Aware Facial Reenactment in Virtual Reality. ACM Transactions on Graphics (TOG). Google ScholarDigital Library
65. Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting. In ACM TOG, Vol. 31. Google ScholarDigital Library
66. Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. 2005. Face transfer with multilinear models. In ACM TOG, Vol. 24. Google ScholarDigital Library
67. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. In ACM TOG, Vol. 30. Issue 4. Google ScholarDigital Library
68. Thibaut Weise, Hao Li, Luc J. Van Gool, and Mark Pauly. 2009. Face/Off: live facial puppetry. In Proc. SCA. 7–16. Google ScholarDigital Library
69. Daniel N. Wood, Daniel I. Azuma, Ken Aldinger, Brian Curless, Tom Duchamp, David H. Salesin, and Werner Stuetzle. 2000. Surface Light Fields for 3D Photography. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 287–296. Google ScholarDigital Library
70. Jing Xiao, Simon Baker, Iain Matthews, and Takeo Kanade. 2004. Real-Time Combined 2D+3D Active Appearance Models. In Proc. CVPR. 535 — 542. Google ScholarDigital Library
71. S. Zafeiriou, A. Roussos, A. Ponniah, D. Dunaway, and J. Booth. 2017. Large Scale 3D Morphable Models. International Journal of Computer Vision (2017).Google Scholar
72. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time Non-rigid Reconstruction using an RGB-D Camera. In ACM TOG, Vol. 33. Google ScholarDigital Library
73. M. Zollhöfer, J. Thies, P. Garrido, D. Bradley, T. Beeler, P Pérez, M. Stamminger, M. Nießner, and C. Theobalt. 2018. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications. Computer Graphics Forum (EUROGRAPHICS 2018) 37, 2.Google Scholar

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2018: Technical Papers

“Headon: real-time reenactment of human portrait videos” by Thies, Zollhöfer, Theobalt, Stamminger and Niessner

Conference:

Type(s):

Entry Number: 164

Title:

Session/Category Title: Portraits & Speech

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: