“Towards Practical Capture of High-Fidelity Relightable Avatars” by Yang, Zheng, Feng, Huang, Lai, et al. …
Conference:
Type(s):
Title:
- Towards Practical Capture of High-Fidelity Relightable Avatars
Session/Category Title: Full-Body Avatar
Presenter(s)/Author(s):
Abstract:
In this paper, we propose a novel framework, Tracking-free Relightable Avatar (TRAvatar), for capturing and reconstructing high-fidelity 3D avatars. Compared to previous methods, TRAvatar works in a more practical and efficient setting. Specifically, TRAvatar is trained with dynamic image sequences captured in a Light Stage under varying lighting conditions, enabling realistic relighting and real-time animation for avatars in diverse scenes. Additionally, TRAvatar allows for tracking-free avatar capture and obviates the need for accurate surface tracking under varying illumination conditions. Our contributions are two-fold: First, we propose a novel network architecture that explicitly builds on and ensures the satisfaction of the linear nature of lighting. Trained on simple group light captures, TRAvatar can predict the appearance in real-time with a single forward pass, achieving high-quality relighting effects under illuminations of arbitrary environment maps. Second, we jointly optimize the facial geometry and relightable appearance from scratch based on image sequences, where the tracking is implicitly learned. This tracking-free approach brings robustness for establishing temporal correspondences between frames under different lighting conditions. Extensive qualitative and quantitative experiments demonstrate that our framework achieves superior performance for photorealistic avatar animation and relighting.
References:
[1]
Timur Bagautdinov, Chenglei Wu, Jason Saragih, Pascal Fua, and Yaser Sheikh. 2018. Modeling facial geometry using compositional vaes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3877–3886.
[2]
Ronen Basri and David W Jacobs. 2003. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 2 (2003), 218–233.
[3]
Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. 2010. High-quality single-shot capture of facial geometry. In ACM SIGGRAPH 2010 papers. 1–9.
[4]
Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W Sumner, and Markus Gross. 2011. High-quality passive facial performance capture using anchor frames. In ACM SIGGRAPH 2011 papers. 1–10.
[5]
Sai Bi, Stephen Lombardi, Shunsuke Saito, Tomas Simon, Shih-En Wei, Kevyn Mcphail, Ravi Ramamoorthi, Yaser Sheikh, and Jason Saragih. 2021. Deep relightable appearance models for animatable faces. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–15.
[6]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.
[7]
Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High resolution passive facial performance capture. In ACM SIGGRAPH 2010 papers. 1–10.
[8]
Chen Cao, Vasu Agrawal, Fernando De La Torre, Lele Chen, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2021. Real-time 3D neural facial animation from binocular video. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–17.
[9]
Chen Cao, Tomas Simon, Jin Kyu Kim, Gabe Schwartz, Michael Zollhoefer, Shun-Suke Saito, Stephen Lombardi, Shih-En Wei, Danielle Belko, Shoou-I Yu, 2022. Authentic volumetric avatars from a phone scan. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–19.
[10]
Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 413–425.
[11]
Subrahmanyan Chandrasekhar. 2013. Radiative transfer. Courier Corporation.
[12]
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (TOG) 34, 4 (2015), 1–13.
[13]
Paul Debevec. 2012. The light stages and their applications to photoreal digital actors. Technical Report. UNIVERSITY OF SOUTHERN CALIFORNIA LOS ANGELES.
[14]
Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark Sagar. 2000. Acquiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 145–156.
[15]
Paul Debevec, Andreas Wenger, Chris Tchou, Andrew Gardner, Jamie Waese, and Tim Hawkins. 2002. A lighting reproduction approach to live-action compositing. ACM Transactions on Graphics (TOG) 21, 3 (2002), 547–556.
[16]
Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017a. Motion2fusion: Real-time volumetric performance capture. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1–16.
[17]
Pengfei Dou, Shishir K Shah, and Ioannis A Kakadiaris. 2017b. End-to-end 3D face reconstruction with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5908–5917.
[18]
Martin Fuchs, Volker Blanz, Hendrik PA Lensch, and Hans-Peter Seidel. 2007. Adaptive sampling of reflectance fields. ACM Transactions on Graphics (TOG) 26, 2 (2007), 10–es.
[19]
Graham Fyffe and Paul Debevec. 2015. Single-shot reflectance measurement from polarized color gradient illumination. In 2015 IEEE International Conference on Computational Photography (ICCP). IEEE, 1–10.
[20]
Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, and Juyong Zhang. 2022. Reconstructing personalized semantic facial nerf models from monocular video. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–12.
[21]
Abhijeet Ghosh, Graham Fyffe, Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul Debevec. 2011. Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics (TOG) 30, 6 (2011), 1–10.
[22]
Kaiwen Guo, Peter Lincoln, Philip Davidson, Jay Busch, Xueming Yu, Matt Whalen, Geoff Harvey, Sergio Orts-Escolano, Rohit Pandey, Jason Dourgarian, 2019. The relightables: Volumetric performance capture of humans with realistic relighting. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–19.
[23]
Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. 2011. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. In ACM SIGGRAPH 2011 papers. 1–10.
[24]
Zi-Hang Jiang, Qianyi Wu, Keyu Chen, and Juyong Zhang. 2019. Disentangled representation learning for 3D face shape. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11957–11966.
[25]
Tero Karras and Timo Aila. 2013. Fast parallel construction of high-quality bounding volume hierarchies. In Proceedings of the 5th High-Performance Graphics Conference. 89–99.
[26]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations, ICLR.
[27]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[28]
Junxuan Li, Shunsuke Saito, Tomas Simon, Stephen Lombardi, Hongdong Li, and Jason Saragih. 2023. MEGANE: Morphable Eyeglass and Avatar Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12769–12779.
[29]
Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics (TOG) 36, 6 (2017), 194:1–194:17.
[30]
Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018. Deep appearance models for face rendering. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–13.
[31]
Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, and Jason Saragih. 2021. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–13.
[32]
Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, Paul E Debevec, 2007. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination.Rendering Techniques 2007, 9 (2007), 10.
[33]
Abhimitra Meka, Christian Haene, Rohit Pandey, Michael Zollhöfer, Sean Fanello, Graham Fyffe, Adarsh Kowdle, Xueming Yu, Jay Busch, Jason Dourgarian, 2019. Deep reflectance fields: high-quality facial reflectance field inference from color gradient illumination. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–12.
[34]
Abhimitra Meka, Rohit Pandey, Christian Haene, Sergio Orts-Escolano, Peter Barnum, Philip David-Son, Daniel Erickson, Yinda Zhang, Jonathan Taylor, Sofien Bouaziz, 2020. Deep relightable textures: volumetric performance capture with neural rendering. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–21.
[35]
Lucio Moser, Chinyu Chien, Mark Williams, Jose Serra, Darren Hendler, and Doug Roble. 2021. Semi-supervised video-driven facial animation transfer for production. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–18.
[36]
Pieter Peers, Dhruv K Mahajan, Bruce Lamond, Abhijeet Ghosh, Wojciech Matusik, Ravi Ramamoorthi, and Paul Debevec. 2009. Compressive light transport sensing. ACM Transactions on Graphics (TOG) 28, 1 (2009), 1–18.
[37]
Dikpal Reddy, Ravi Ramamoorthi, and Brian Curless. 2012. Frequency-space decomposition and acquisition of light transport under spatially varying illumination. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12. Springer, 596–610.
[38]
Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Chenglei Wu, Tomas Simon, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih, 2022. Drivable volumetric avatars using texel-aligned features. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.
[39]
Jérémy Riviere, Paulo Gotardo, Derek Bradley, Abhijeet Ghosh, and Thabo Beeler. 2020. Single-shot high-quality facial geometry and skin appearance capture. ACM Transactions on Graphics (TOG) 39, 4 (2020), 1–12.
[40]
Christophe Schlick. 1994. An inexpensive BRDF model for physically-based rendering. In Computer graphics forum. 233–246.
[41]
Gabriel Schwartz, Shih-En Wei, Te-Li Wang, Stephen Lombardi, Tomas Simon, Jason Saragih, and Yaser Sheikh. 2020. The eyes have it: An integrated eye and face model for photorealistic facial animation. ACM Transactions on Graphics (TOG) 39, 4 (2020), 91:1–91:15.
[42]
Tiancheng Sun, Zexiang Xu, Xiuming Zhang, Sean Fanello, Christoph Rhemann, Paul Debevec, Yun-Ta Tsai, Jonathan T Barron, and Ravi Ramamoorthi. 2020. Light stage super-resolution: continuous high-frequency relighting. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–12.
[43]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2395.
[44]
Luan Tran and Xiaoming Liu. 2018. Nonlinear 3D face morphable model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7346–7355.
[45]
Luan Tran and Xiaoming Liu. 2019. On learning 3D face morphable model from in-the-wild images. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2019), 157–171.
[46]
Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popović. 2005. Face Transfer with Multilinear Models. ACM Transactions on Graphics (TOG) 24, 3 (2005), 426–433.
[47]
Jiaping Wang, Yue Dong, Xin Tong, Zhouchen Lin, and Baining Guo. 2009. Kernel Nyström method for light transport. In ACM SIGGRAPH 2009 papers. 1–10.
[48]
Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM Transactions on Graphics (TOG) 30, 4 (2011), 1–10.
[49]
Tim Weyrich, Wojciech Matusik, Hanspeter Pfister, Bernd Bickel, Craig Donner, Chien Tu, Janet McAndless, Jinho Lee, Addy Ngan, Henrik Wann Jensen, 2006. Analysis of human faces using a measurement-based skin reflectance model. ACM Transactions on Graphics (TOG) 25, 3 (2006), 1013–1024.
[50]
Chenglei Wu, Derek Bradley, Markus Gross, and Thabo Beeler. 2016. An anatomically-constrained local deformation model for monocular face capture. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1–12.
[51]
Chenglei Wu, Takaaki Shiratori, and Yaser Sheikh. 2018. Deep incremental learning for efficient high-fidelity face tracking. ACM Transactions on Graphics (TOG) 37, 6 (2018), 234:1–234:12.
[52]
Cheng-hsin Wuu, Ningyuan Zheng, Scott Ardisson, Rohan Bali, Danielle Belko, Eric Brockmeyer, Lucas Evans, Timothy Godisart, Hyowon Ha, Alexander Hypes, Taylor Koska, Steven Krenn, Stephen Lombardi, Xiaomin Luo, Kevyn McPhail, Laura Millerschoen, Michal Perdoch, Mark Pitts, Alexander Richard, Jason Saragih, Junko Saragih, Takaaki Shiratori, Tomas Simon, Matt Stewart, Autumn Trimble, Xinshuo Weng, David Whitewolf, Chenglei Wu, Shoou-I Yu, and Yaser Sheikh. 2022. Multiface: A Dataset for Neural Face Rendering. arXiv preprint arXiv:2207.11243 (2022).
[53]
Zexiang Xu, Kalyan Sunkavalli, Sunil Hadap, and Ravi Ramamoorthi. 2018. Deep image-based relighting from optimal sparse samples. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–13.
[54]
Haotian Yang, Hao Zhu, Yanru Wang, Mingkai Huang, Qiu Shen, Ruigang Yang, and Xun Cao. 2020. Facescape: a large-scale high quality 3D face dataset and detailed riggable 3d face prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 601–610.
[55]
Yu-Ying Yeh, Koki Nagano, Sameh Khamis, Jan Kautz, Ming-Yu Liu, and Ting-Chun Wang. 2022. Learning to Relight Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–21.
[56]
Jae Shin Yoon, Takaaki Shiratori, Shoou-I Yu, and Hyun Soo Park. 2019. Self-supervised adaptation of high-fidelity face models for monocular performance tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4601–4609.
[57]
Longwen Zhang, Chuxiao Zeng, Qixuan Zhang, Hongyang Lin, Ruixiang Cao, Wei Yang, Lan Xu, and Jingyi Yu. 2022. Video-driven neural physically-based facial asset for production. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–16.
[58]
Mingwu Zheng, Zhang Haiyu, Hongyu Yang, and Di Huang. 2023. NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16868–16877.
[59]
Mingwu Zheng, Hongyu Yang, Di Huang, and Liming Chen. 2022. ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20343–20352.
[60]
Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W Jacobs. 2019. Deep single-image portrait relighting. In Proceedings of the IEEE International Conference on Computer Vision. 7194–7202.
[61]
Xiangyu Zhu, Xiaoming Liu, Zhen Lei, and Stan Z Li. 2017. Face alignment in full pose range: A 3d total solution. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 1 (2017), 78–92.
[62]
Wojciech Zielonka, Timo Bolkart, and Justus Thies. 2023. Instant volumetric head avatars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4574–4584.

