“Deep incremental learning for efficient high-fidelity face tracking”
Conference:
Type(s):
Title:
- Deep incremental learning for efficient high-fidelity face tracking
Session/Category Title: Faces, faces, faces
Presenter(s)/Author(s):
Moderator(s):
Abstract:
In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors. Our Geo-Tex VAE model extends the convolutional variational autoencoder for face tracking, and jointly learns and represents deformations and variations in geometry and texture from tracked meshes and texture maps. To accurately model variations in facial geometry and texture, we introduce the decomposition layer in the Geo-Tex VAE architecture which decomposes the facial deformation into global and local components. We train the global deformation with a fully-connected network and the local deformations with convolutional layers. Despite running this model on each frame independently – thereby enabling a high amount of parallelization – we validate that our framework achieves sub-millimeter accuracy on synthetic data and outperforms existing methods. We also qualitatively demonstrate high-fidelity, long-duration facial performance tracking on several actors.
References:
1. K. S. Arun, T. S. Huang, and S. D. Blostein. 1987. Least-Squares Fitting of Two 3-D Point Sets. IEEE TPAMI 9, 5 (1987), 698–700. Google ScholarDigital Library
2. Tali Basha, Yael Moses, and Nahum Kiryati. 2013. Multi-view Scene Flow Estimation: A View Centered Variational Approach. IJCV 101, 1 (2013), 6–21. Google ScholarDigital Library
3. Thabo Beeler, Bernd Bickel, Gioacchino Noris, Paul Beardsley, Steve Marschner, Robert W. Sumner, and Markus Gross. 2012. Coupled 3D Reconstruction of Sparse Facial Hair and Skin. ACM TOG 31, 4 (2012). Google ScholarDigital Library
4. Thabo Beeler and Derek Bradley. 2014. Rigid Stabilization of Facial Expressions. ACM TOG 33, 4 (2014). Google ScholarDigital Library
5. Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality Passive Facial Performance Capture Using Anchor Frames. ACM TOG 30, 4 (2011). Google ScholarDigital Library
6. Amit Bermano, Thabo Beeler, Yeara Kozlov, Derek Bradley, Bernd Bickel, and Markus Gross. 2015. Detailed Spatio-temporal Reconstruction of Eyelids. ACM TOG 34, 4 (2015). Google ScholarDigital Library
7. Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proc. SIGGRAPH. Google ScholarDigital Library
8. James Booth, Epameinondas Antonakos, Stylianos Ploumpis, George Trigeorgis, Yannis Panagakis, and Stefanos Zafeiriou. 2017. 3D Face Morphable Models “In-The-Wild”. In Proc. CVPR.Google ScholarCross Ref
9. Sofien Bouaziz, Yangang Wang, and Mark Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM TOG 32, 4 (2013). Google ScholarDigital Library
10. Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High Resolution Passive Facial Performance Capture. ACM TOG 29, 4 (2010). Google ScholarDigital Library
11. Chen Cao, Derek Bradley, Kun Zhou, and Thabo Beeler. 2015. Real-time High-fidelity Facial Performance Capture. ACM TOG 34, 4 (2015). Google ScholarDigital Library
12. Chen Cao, Qiming Hou, and Kun Zhou. 2014a. Displaced Dynamic Expression Regression for Real-time Facial Tracking and Animation. ACM TOG 33, 4 (2014). Google ScholarDigital Library
13. Chen Cao, Yanlin Weng, Stephen Lin, and Kun Zhou. 2013. 3D Shape Regression for Real-time Facial Animation. ACM TOG 32, 4 (2013). Google ScholarDigital Library
14. Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014b. Face alignment by explicit shape regression. IJCV (2014). Google ScholarDigital Library
15. G.D. Evangelidis and E.Z. Psarakis. 2008. Parametric Image Alignment by Using Enhanced Correlation Coefficient Maximization. IEEE TPAMI 30, 10 (2008). Google ScholarDigital Library
16. Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM TOG 34, 1 (2014). Google ScholarDigital Library
17. G. Fyffe, K. Nagano, L. Huynh, S. Saito, J. Busch, A. Jones, H. Li, and P. Debevec. 2017. Multi-View Stereo on Consistent Face Topology. Computer Graphics Forum (2017). Google ScholarDigital Library
18. Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. 2013. Reconstructing Detailed Dynamic Face Geometry from Monocular Video. ACM TOG 32, 6 (2013). Google ScholarDigital Library
19. Pablo Garrido, Michael Zollhöfer, Dan Casas, Levi Valgaerts, Kiran Varanasi, Patrick Pérez, and Christian Theobalt. 2016. Reconstruction of Personalized 3D Face Rigs from Monocular Video. ACM TOG 35, 3 (2016). Google ScholarDigital Library
20. R. I. Hartley and A. Zisserman. 2004. Multiple View Geometry in Computer Vision (second ed.). Cambridge University Press. Google ScholarDigital Library
21. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. CVPR.Google ScholarCross Ref
22. Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging Motion Capture and 3D Scanning for High-fidelity Facial Performance Acquisition. ACM TOG 30, 4 (2011). Google ScholarDigital Library
23. Loc Huynh, Weikai Chen, Shunsuke Saito, Jun Xing, Koki Nagano, Andrew Jones, Paul Debevec, and Hao Li. 2018. Mesoscopic Facial Geometry Inference Using Deep Neural Networks. In Proc. CVPR.Google ScholarCross Ref
24. Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D Avatar Creation from Hand-held Video Input. ACM TOG 34, 4 (2015). Google ScholarDigital Library
25. Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion. ACM TOG 36, 4 (2017). Google ScholarDigital Library
26. D. P. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization. In Proc. ICLR.Google Scholar
27. D. P. Kingma and M. Welling. 2014. Auto-Encoding Variational Bayes. In Proc. ICLR.Google Scholar
28. M. Klaudiny and A. Hilton. 2012. High-Detail 3D Capture and Non-sequential Alignment of Facial Performance. In Proc. 3DIMPVT. 17–24. Google ScholarDigital Library
29. John P. Lewis, Ken ichi Anjyo, Taehyun Rhee, Mengjie Zhang, Frédéric H. Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. In Proc. Eurographics State of The Art Report.Google Scholar
30. Hao Li, Thibaut Weise, and Mark Pauly. 2010. Example-Based Facial Rigging. ACM TOG 29, 3 (2010). Google ScholarDigital Library
31. Hao Li, Jihun Yu, Yuting Ye, and Chris Bregler. 2013. Realtime Facial Animation with On-the-fly Correctives. ACM TOG 32, 4 (2013). Google ScholarDigital Library
32. Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM TOG 36, 6 (2017). Google ScholarDigital Library
33. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60, 2 (2004). Google ScholarDigital Library
34. Iain Matthews and Simon Baker. 2004. Active Appearance Models Revisited. IJCV 60, 2 (2004). Google ScholarDigital Library
35. Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson Image Editing. ACM TOG 22, 3 (2003). Google ScholarDigital Library
36. K. Lasinger S. Galliani and K. Schindler. 2015. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion. In Proc. ICCV. Google ScholarDigital Library
37. Shunsuke Saito, Lingyu Wei, Liwen Hu, Koki Nagano, and Hao Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks. In Proc. CVPR.Google ScholarCross Ref
38. Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable Model Fitting by Regularized Landmark Mean-Shift. IJCV 91, 2 (2011). Google ScholarDigital Library
39. Fuhao Shi, Hsiang-Tao Wu, Xin Tong, and Jinxiang Chai. 2014. Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos. ACM TOG 33, 6 (2014). Google ScholarDigital Library
40. Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible Surface Modeling. In Proc. SGP. Google ScholarDigital Library
41. Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M. Seitz. 2014. Total Moving Face Reconstruction. In Proc. ECCV. 796–812.Google Scholar
42. Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Proc. CVPR. Google ScholarDigital Library
43. Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A Deep Learning Approach for Generalized Speech Animation. ACM TOG 36, 4 (2017). Google ScholarDigital Library
44. J. Rafael Tena, Fernando De la Torre, and Iain Matthews. 2011. Interactive Region-Based Linear 3D Face Models. ACM TOG 30, 4 (2011). Google ScholarDigital Library
45. A. Tewari, M. Zollhöfer, P. Garrido, F. Bernard, H. Kim, P. Pérez, and C. Theobalt. 2018. Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz. In Proc. CVPR.Google Scholar
46. Ayush Tewari, Michael Zollöfer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction. In Proc. ICCV.Google Scholar
47. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time Expression Transfer for Facial Reenactment. ACM TOG 34, 6 (2015). Google ScholarDigital Library
48. Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-Time Face Capture and Reenactment of RGB Videos. In Proc. CVPR.Google ScholarDigital Library
49. Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D Face Morphable Model. In Proc. CVPR.Google ScholarCross Ref
50. L. Valgaerts, A. Bruhn, H. Zimmer, J. Weickert, C. Stoll, and C. Theobalt. 2010. Joint estimation of motion, structure and geometry from stereo sequences. In Proc. ECCV. Google ScholarDigital Library
51. Levi Valgaerts, Chenglei Wu, Andrés Bruhn, Hans-Peter Seidel, and Christian Theobalt. 2012. Lightweight Binocular Facial Performance Capture under Uncontrolled Lighting. ACM TOG 31, 6 (2012). Google ScholarDigital Library
52. S. Vedula, P. Rander, R. Collins, and T. Kanade. 2005. Three-dimensional scene flow. IEEE TPAMI 27, 3 (2005). Google ScholarDigital Library
53. D. Vlasic, M. Brand, H. Pfister, and J. Popovic. 2005. Face transfer with multilinear models. ACM TOG 24, 3 (2005). Google ScholarDigital Library
54. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional Pose Machines. In Proc. CVPR.Google ScholarCross Ref
55. Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM TOG (2011). Google ScholarDigital Library
56. Thibaut Weise, Hao Li, Luc Van Gool, and Mark Pauly. 2009. Face/Off: Live Facial Puppetry. In Proc. SCA. Google ScholarDigital Library
57. Quan Wen, Feng Xu, Ming Lu, and Yong Jun-Hai. 2017. Real-time 3D Eyelids Tracking from Semantic Edges. ACM TOG (2017). Google ScholarDigital Library
58. Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An Anatomically-constrained Local Deformation Model for Monocular Face Capture. ACM TOG 35, 4 (2016). Google ScholarDigital Library
59. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time Non-rigid Reconstruction Using an RGB-D Camera. ACM TOG 33, 4 (2014). Google ScholarDigital Library

