“VOODOO VR: One-Shot Neural Avatars for Virtual Reality” by Tran, Zakharov, Ho, Karmanov, Kengeskanov, et al. …
Conference:
Type(s):
Entry Number: real_106
Title:
- VOODOO VR: One-Shot Neural Avatars for Virtual Reality
Developer(s):
Description:
We present a complete solution for real-time immersive face-to-face communication using VR headsets and photorealistic neural head avatars generated instantly (60 ms) from a single photo. Our avatars are view-consistent neural radiance fields of a person’s head 3D lifted into disentangled appearance-expression representation from 2D photographs via transformer networks.
References:
[1]
Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y. Ogras, and Linjie Luo. 2023. PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360deg. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2]
Apple. 2024. Apple Vision Pro, https://www.apple.com/apple-vision-pro/.
[3]
Qingyan Bai, Yinghao Xu, Zifan Shi, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, and Qifeng Chen. 2024. Real-time 3D-aware Portrait Editing from a Single Image. arXiv preprint arXiv:2402.14000 (2024).
[4]
Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, and Yinda Zhang. 2023. Learning Personalized High Quality Volumetric Head Avatars From Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Shrisha Bharadwaj, Yufeng Zheng, Otmar Hilliges, Michael J. Black, and Victoria Fernandez Abrevaya. 2023. FLARE: Fast learning of Animatable and Relightable Mesh Avatars. ACM Transactions on Graphics 42 (2023), 15.
[6]
Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’99). Article 18, 8 pages.
[7]
Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, and Georgios Tzimiropoulos. 2024. DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment. arXiv preprint arXiv:2403.17217.
[8]
Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE international conference on computer vision. 1021–1030.
[9]
Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor Lempitsky. 2020. Neural head reenactment with latent pose descriptors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13786–13795.
[10]
Chen Cao, Tomas Simon, Jin Kyu Kim, Gabe Schwartz, Michael Zollhoefer, Shun-Suke Saito, Stephen Lombardi, Shih-En Wei, Danielle Belko, Shoou-I Yu, Yaser Sheikh, and Jason Saragih. 2022. Authentic volumetric avatars from a phone scan. ACM Trans. Graph. 41, 4, Article 163 (2022), 19 pages.
[11]
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv? J?gou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650–9660.
[12]
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. 2022. Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16123–16133.
[13]
Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5799–5809.
[14]
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. 2022. TensoRF: Tensorial Radiance Fields. In European Conference on Computer Vision (ECCV).
[15]
Chuhan Chen, Matthew O’Toole, Gaurav Bharaj, and Pablo Garrido. 2023. Implicit Neural Head Synthesis via Controllable Local Deformation Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 416–426.
[16]
Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, and Tatsuya Harada. 2024. GPAvatar: Generalizable and Precise Head Avatar from Image (s). In The Twelfth International Conference on Learning Representations.
[17]
Radek Dan??ek, Michael J Black, and Timo Bolkart. 2022. EMOCA: Emotion driven monocular face capture and animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20311–20322.
[18]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019a. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699.
[19]
Yu Deng, Duomin Wang, Xiaohang Ren, Xingyu Chen, and Baoyuan Wang. 2024b. Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20]
Yu Deng, Duomin Wang, and Baoyuan Wang. 2024a. Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer. arXiv preprint arXiv:2403.13570 (2024).
[21]
Yu Deng, Jiaolong Yang, Jianfeng Xiang, and Xin Tong. 2022. GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10673–10683.
[22]
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019b. Accurate 3d face reconstruction with weakly-supervised learning: From single image to image set. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 0–0.
[23]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
[24]
Michail Christos Doukas, Stefanos Zafeiriou, and Viktoriia Sharmanska. 2021. HeadGAN: One-shot Neural Head Synthesis and Editing. In IEEE/CVF International Conference on Computer Vision (ICCV).
[25]
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas, Zoe Landgraf, Stavros Petridis, and Maja Pantic. 2024. EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26]
Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov. 2022. Megaportraits: One-shot megapixel neural head avatars. In Proceedings of the 30th ACM International Conference on Multimedia. 2663–2671.
[27]
P. Ekman and W.V. Friesen. 1978. Facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press, Palo Alto, CA.
[28]
Epic Games. 2024a. Unreal Engine, https://www.unrealengine.com/.
[29]
Epic Games. 2024b. Metahuman Creator, https://www.unrealengine.com/en-US/metahuman.
[30]
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.
[31]
Guy Gafni, Justus Thies, Michael Zollhofer, and Matthias Nie?ner. 2021. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8649–8658.
[32]
Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, and Juyong Zhang. 2022. Reconstructing Personalized Semantic Facial NeRF Models from Monocular Video. ACM Transactions on Graphics 41, 6 (2022), 1–12.
[33]
Yue Gao, Yuan Zhou, Jinglu Wang, Xiao Li, Xiang Ming, and Yan Lu. 2023. High-Fidelity and Freely Controllable Talking Head Video Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5609–5619.
[34]
Stephan J. Garbin, Marek Kowalski, Virginia Estellers, Stanislaw Szymanowicz, Shideh Rezaeifar, Jingjing Shen, Matthew Johnson, and Julien Valentin. 2022. VolTeMorph: Realtime, Controllable and Generalisable Animation of Volumetric Representations. arXiv:2208.00949 [cs.GR]
[35]
Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nie?ner, and Justus Thies. 2022. Neural Head Avatars From Monocular RGB Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18653–18664.
[36]
Yuming Gu, Hongyi Xu, You Xie, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, and Lingjie Luo. 2024. DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37]
Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, and Juyong Zhang. 2021. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE/CVF international conference on computer vision. 5784–5794.
[38]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6629–6640.
[39]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
[40]
Fa-Ting Hong, Longhao Zhang, Li Shen, and Dan Xu. 2022b. Depth-Aware Generative Adversarial Network for Talking Head Video Generation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41]
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. 2022a. Headnerf: A real-time nerf-based parametric head model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20374–20384.
[42]
Yiyu huang, Hao Zhu, Xusen Sun, and Xun Cao. 2022. MoFaNeRF: Morphable Facial Neural Radiance Field. In ECCV.
[43]
Xinya Ji, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Wayne Wu, Feng Xu, and Xun Cao. 2022. EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model. In ACM SIGGRAPH 2022 Conference Proceedings (SIGGRAPH ’22).
[44]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.
[45]
Taras Khakhulin, Vanessa Sklyarova, Victor Lempitsky, and Egor Zakharov. 2022. Realistic One-shot Mesh-based Head Avatars. In European Conference of Computer vision (ECCV). Springer, 345–362.
[46]
Taekyung Ki, Dongchan Min, and Gyeongsu Chae. 2024. Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation. arXiv preprint arXiv:2404.00636 (2024).
[47]
Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick P?rez, Christian Richardt, Michael Zollh?fer, and Christian Theobalt. 2018. Deep video portraits. ACM transactions on graphics (TOG) 37, 4 (2018), 1–14.
[48]
Tobias Kirschstein, Shenhan Qian, Simon Giebenhain, Tim Walter, and Matthias Nie?ner. 2023. NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads. ACM Transactions on Graphics 42, 4 (2023), 1–14.
[49]
Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. 2018. Noise2Noise: Learning Image Restoration without Clean Data. In Proceedings of the 35th International Conference on Machine Learning.
[50]
Hao Li, Bart Adams, Leonidas J. Guibas, and Mark Pauly. 2009. Robust Single-View Geometry And Motion Reconstruction. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia 2009) 28, 5 (2009).
[51]
Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial Performance Sensing Head-Mounted Display. ACM Transactions on Graphics (Proceedings SIGGRAPH 2015) 34, 4 (2015).
[52]
Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (2017), 194:1–194:17.
[53]
Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, and Xuelong Li. 2023b. One-Shot High-Fidelity Talking-Head Synthesis With Deformable Neural Radiance Field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17969–17978.
[54]
Xueting Li, Shalini De Mello, Sifei Liu, Koki Nagano, Umar Iqbal, and Jan Kautz. 2023a. Generalizable One-shot Neural Head Avatar. Advances in Neural Information Processing Systems 36 (2023).
[55]
Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018a. Deep appearance models for face rendering. ACM Transactions on Graphics (ToG) 37, 4 (2018), 1–13.
[56]
Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018b. Deep Appearance Models for Face Rendering. ACM Trans. Graph. 37, 4 (2018).
[57]
Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh. 2019. Neural Volumes: Learning Dynamic Renderable Volumes from Images. ACM Trans. Graph. 38, 4, Article 65 (2019), 14 pages.
[58]
Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, and Jason Saragih. 2021. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.
[59]
Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, and Yaser Sheikh. 2021. Pixel codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 64–73.
[60]
Shengjie Ma, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2024. 3D Gaussian Blendshapes for Head Avatar Animation. In ACM SIGGRAPH 2024 Conference Papers. 1–10.
[61]
Zhiyuan Ma, Xiangyu Zhu, Guo-Jun Qi, Zhen Lei, and Lei Zhang. 2023. OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16901–16910.
[62]
Meta. 2022. Meta Quest Pro, https://www.meta.com/quest/quest-pro/.
[63]
Meta. 2024a. Movement SDK for Unity, https://developer.oculus.com/documentation/unity/move-overview/.
[64]
Meta. 2024b. Meta Quest Headset Tracking, https://www.meta.com/help/quest/articles/headsets-and-accessories/using-your-headset/turn-off-tracking/.
[65]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
[66]
Michael Niemeyer and Andreas Geiger. 2021. GIRAFFE: Representing Scenes As Compositional Generative Neural Feature Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11453–11464.
[67]
Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-Yun Kim, Young-Min Kim, Hyeon-Jik Lee, Hyuk-Sool Kwon, and Hyeon-Min Bae. 2024. Key-point Guided Deformable Image Manipulation Using Diffusion Model.
[68]
Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity facial and speech animation for VR HMDs. ACM Transactions on Graphics (TOG) 35 (2016), 1–14. https://api.semanticscholar.org/CorpusID:4956725
[69]
Maxime Oquab, Timoth?e Darcet, Th?o Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023).
[70]
Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, and Ira Kemelmacher-Shlizerman. 2022. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13503–13513.
[71]
Pinscreen. 2024. Pinscreen Avatar Neo, https://www.avatarneo.com.
[72]
Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nie?ner. 2024. GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[73]
Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, and Shan Liu. 2021. PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 13759–13768.
[74]
Andre Rochow, Max Schwarz, and Sven Behnke. 2024. FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features.
[75]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
[76]
Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. 2024. Relightable Gaussian Codec Avatars.
[77]
Axel Sauer, Kashyap Chitta, Jens M?ller, and Andreas Geiger. 2021. Projected gans converge faster. Advances in Neural Information Processing Systems 34 (2021), 17480–17492.
[78]
Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20).
[79]
Aliaksandr Siarohin, St?phane Lathuili?re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019a. First order motion model for image animation. Advances in Neural Information Processing Systems 32 (2019).
[80]
Aliaksandr Siarohin, St?phane Lathuili?re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019b. Animating Arbitrary Objects via Deep Motion Transfer. In CVPR.
[81]
Aliaksandr Siarohin, Oliver Woodford, Jian Ren, Menglei Chai, and Sergey Tulyakov. 2021. Motion Representations for Articulated Animation. In CVPR.
[82]
Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, and Peter Wonka. 2022. EpiGRAF: Rethinking training of 3D GANs. In Advances in Neural Information Processing Systems.
[83]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021a. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
[84]
Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, and Ran He. 2021c. Pareidolia Face Reenactment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[85]
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021b. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations.
[86]
Michael Stengel, Koki Nagano, Chao Liu, Matthew Chan, Alex Trevithick, Shalini De Mello, Jonghyun Kim, and David Luebke. 2023. AI-Mediated 3D Video Conferencing. In ACM SIGGRAPH Emerging Technologies.
[87]
Robert W. Sumner and Jovan Popovi?. 2004. Deformation transfer for triangle meshes. 23, 3 (2004), 399–405.
[88]
Jiale Tao, Shuhang Gu, Wen Li, and Lixin Duan. 2024. Learning Motion Refinement for Unsupervised Face Animation. Advances in Neural Information Processing Systems 36 (2024).
[89]
Jiale Tao, Biao Wang, Borun Xu, Tiezheng Ge, Yuning Jiang, Wen Li, and Lixin Duan. 2022. Structure-Aware Motion Transfer With Deformable Anchor Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3637–3646.
[90]
Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nie?ner, et al. 2020. State of the art on neural rendering. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 701–727.
[91]
Justus Thies, Michael Zollh?fer, and Matthias Nie?ner. 2019. Deferred neural rendering: Image synthesis using neural textures. Acm Transactions on Graphics (TOG) 38, 4 (2019), 1–12.
[92]
Phong Tran, Egor Zakharov, Long-Nhat Ho, Anh Tuan Tran, Liwen Hu, and Hao Li. 2024. VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024).
[93]
Alex Trevithick, Matthew Chan, Michael Stengel, Eric Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, and Koki Nagano. 2023. Real-time radiance fields for single-image portrait view synthesis. ACM Transactions on Graphics (TOG) 42, 4 (2023), 1–15.
[94]
Unity. 2024. Unity Technologies, https://unity.com/.
[95]
Duomin Wang, Yu Deng, Zixin Yin, Heung-Yeung Shum, and Baoyuan Wang. 2022a. Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis. arXiv:2211.14506 [cs.CV]
[96]
Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021b. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10039–10049.
[97]
Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. 2021a. Towards Real-World Blind Face Restoration with Generative Facial Prior. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 9168–9178.
[98]
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021c. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF international conference on computer vision. 1905–1914.
[99]
Yaohui Wang, Di Yang, Francois Bremond, and Antitza Dantcheva. 2022b. Latent Image Animator: Learning to Animate Images via Latent Space Navigation. In International Conference on Learning Representations.
[100]
Shih-En Wei, Jason Saragih, Tomas Simon, Adam W. Harley, Stephen Lombardi, Michal Perdoch, Alexander Hypes, Dawei Wang, Hernan Badino, and Yaser Sheikh. 2019. VR facial animation via multiview image translation. ACM Trans. Graph. 38, 4, Article 67 (2019), 16 pages.
[101]
O. Wiles, A.S. Koepke, and A. Zisserman. 2018. X2Face: A network for controlling face generation by using images, audio, and pose codes. In European Conference on Computer Vision.
[102]
Jun Xiang, Xuan Gao, Yudong Guo, and Juyong Zhang. 2024. FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1802–1812.
[103]
Jianfeng Xiang, Jiaolong Yang, Yu Deng, and Xin Tong. 2023. GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2195–2205.
[104]
Sitao Xiang, Yuming Gu, Pengda Xiang, Mingming He, Koki Nagano, Haiwei Chen, and Hao Li. 2020. One-shot identity-preserving portrait reenactment. arXiv preprint arXiv:2004.12452 (2020).
[105]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34 (2021), 12077–12090.
[106]
You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, and Linjie Luo. 2024. X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention. arXiv:2403.15931 [cs.CV]
[107]
Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, Wanchun Ma, Jiashi Feng, and Linjie Luo. 2023a. OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12814–12824.
[108]
Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, and Yebin Liu. 2024. Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[109]
Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, and Yebin Liu. 2023c. LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar. In ACM SIGGRAPH 2023 Conference Proceedings (SIGGRAPH ’23). Association for Computing Machinery, Article 86.
[110]
Zhongcong Xu, Jianfeng Zhang, Junhao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, and Mike Zheng Shou. 2023b. PV3D: A 3D Generative Model for Portrait Video Generation. In The Tenth International Conference on Learning Representations.
[111]
Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A High-Resolution 3D-Aware Generative Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18440–18449.
[112]
Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiangwei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, and Zhou Zhao. 2024. Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis. ICLR.
[113]
Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, and Yujiu Yang. 2022. StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN. In ECCV.
[114]
Wangbo Yu, Yanbo Fan, Yong Zhang, Xuan Wang, Fei Yin, Yunpeng Bai, Yan-Pei Cao, Ying Shan, Yang Wu, Zhongqian Sun, and Baoyuan Wu. 2023. NOFA: NeRF-Based One-Shot Facial Avatar Reconstruction. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 85, 12 pages.
[115]
Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor Lempitsky. 2020. Fast bi-layer neural synthesis of one-shot realistic head avatars. In European Conference on Computer Vision. Springer, 524–540.
[116]
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. 2019. Fewshot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE/CVF international conference on computer vision. 9459–9468.
[117]
Bowen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, and Fang Wen. 2023. MetaPortrait: Identity-Preserving Talking Head Generation With Fast Personalized Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22096–22105.
[118]
Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, and Jing Liao. 2022. Fdnerf: Few-shot dynamic neural radiance fields for face reconstruction and expression editing. In SIGGRAPH Asia 2022 Conference Papers. 1–9.
[119]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR. 586–595.
[120]
Zhimeng Zhang, Lincheng Li, Yu Ding, and Changjie Fan. 2021. Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3661–3670.
[121]
Jian Zhao and Hui Zhang. 2022. Thin-Plate Spline Motion Model for Image Animation. In CVPR. 3657–3666.
[122]
Xiaochen Zhao, Lizhen Wang, Jingxiang Sun, Hongwen Zhang, Jinli Suo, and Yebin Liu. 2023. Havatar: High-fidelity head avatar via facial model conditioned neural radiance field. ACM Transactions on Graphics 43, 1 (2023), 1–16.
[123]
Yufeng Zheng, Victoria Fern?ndez Abrevaya, Marcel C. B?hler, Xu Chen, Michael J. Black, and Otmar Hilliges. 2022. I M Avatar: Implicit Morphable Head Avatars from Videos. In Computer Vision and Pattern Recognition (CVPR).
[124]
Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, and Otmar Hilliges. 2023. PointAvatar: Deformable Point-based Head Avatars from Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[125]
Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, and Chen Change Loy. 2022. CelebV-HQ: A large-scale video facial attributes dataset. In European conference on computer vision. Springer, 650–667.
[126]
Wojciech Zielonka, Timo Bolkart, and Justus Thies. 2023. Instant Volumetric Head Avatars, In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Conference on Computer Vision and Pattern Recognition.