“AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars” by Hong, Zhang, Pan, Cai, Yang, et al. …
Conference:
Type(s):
Title:
- AvatarCLIP: zero-shot text-driven generation and animation of 3D avatars
Presenter(s)/Author(s):
Abstract:
3D avatar creation plays a crucial role in the digital age. However, the whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to a larger audience, we propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages. Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation. Specifically, driven by natural language descriptions, we initialize 3D human geometry generation with a shape VAE network. Based on the generated 3D human shapes, a volume rendering model is utilized to further facilitate geometry sculpting and texture generation. Moreover, by leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of AvatarCLIP on a wide range of avatars. Remarkably, AvatarCLIP can generate unseen 3D avatars with novel animations, achieving superior zero-shot capability. Codes are available at https://github.com/hongfz16/AvatarCLIP.
References:
1. Gunjan Aggarwal and Devi Parikh. 2021. Dance2Music: Automatic Dance-driven Music Generation. arXiv preprint arXiv:2107.06252 (2021).Google Scholar
2. Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh. 2018. Text2Action: Generative Adversarial Synthesis from Language to Action. In 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21–25, 2018. IEEE, 1–5. Google ScholarDigital Library
3. Chaitanya Ahuja and Louis-Philippe Morency. 2019. Language2Pose: Natural Language Grounded Pose Forecasting. In 2019 International Conference on 3D Vision (3DV). 719–728. Google ScholarCross Ref
4. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8387–8397.Google ScholarCross Ref
5. Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: data-driven responsive control of physics-based characters. ACM Transactions On Graphics (TOG) 38, 6 (2019), 1–11.Google ScholarDigital Library
6. Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, and Gerard Pons-Moll. 2020. Combining implicit function learning and parametric models for 3d human reconstruction. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 311–329.Google Scholar
7. Bharat Lal Bhatnagar, Garvita Tiwari, Christian Theobalt, and Gerard Pons-Moll. 2019. Multi-garment net: Learning to dress 3d people from images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5420–5430.Google ScholarCross Ref
8. Andrei Burov, Matthias Nießner, and Justus Thies. 2021. Dynamic surface function networks for clothed human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10754–10764.Google ScholarCross Ref
9. Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, and Ziwei Liu. 2022. HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling. arXiv preprint arXiv:2204.13686 (2022).Google Scholar
10. Zhongang Cai, Mingyuan Zhang, Jiawei Ren, Chen Wei, Daxuan Ren, Jiatong Li, Zhengyu Lin, Haiyu Zhao, Shuai Yi, Lei Yang, et al. 2021. Playing for 3D Human Recovery. arXiv preprint arXiv:2110.07588 (2021).Google Scholar
11. Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5799–5809.Google ScholarCross Ref
12. Xu Chen, Tianjian Jiang, Jie Song, Jinlong Yang, Michael J Black, Andreas Geiger, and Otmar Hilliges. 2022. gDNA: Towards Generative Detailed Neural Avatars. arXiv (2022).Google Scholar
13. Enric Corona, Albert Pumarola, Guillem Alenya, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware generative model for clothed people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11875–11885.Google ScholarCross Ref
14. Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2016. Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016).Google Scholar
15. Kevin Frans, LB Soros, and Olaf Witkowski. 2021. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021).Google Scholar
16. Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen-Change Loy, Wayne Wu, and Ziwei Liu. 2022. StyleGAN-Human: A Data-Centric Odyssey of Human Generation. arXiv preprint arXiv:2204.11823 (2022).Google Scholar
17. Anindita Ghosh, Noshaba Cheema, Cennet Oguz, Christian Theobalt, and Philipp Slusallek. 2021. Text-Based Motion Synthesis with a Hierarchical Two-Stream RNN. In ACM SIGGRAPH 2021 Posters. 1–2.Google ScholarDigital Library
18. Artur Grigorev, Karim Iskakov, Anastasia Ianina, Renat Bashirov, Ilya Zakharkin, Alexander Vakhitov, and Victor Lempitsky. 2021. StylePeople: A Generative Model of Fullbody Human Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5151–5160.Google ScholarCross Ref
19. Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. 2020. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020).Google Scholar
20. Chuan Guo, Xinxin Zuo, Sen Wang, Shihao Zou, Qingyao Sun, Annan Deng, Minglun Gong, and Li Cheng. 2020. Action2motion: Conditioned generation of 3d human motions. In Proceedings of the 28th ACM International Conference on Multimedia. 2021–2029.Google ScholarDigital Library
21. Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt. 2021. Real-time deep dynamic characters. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–16.Google ScholarDigital Library
22. Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, and Larry S Davis. 2018. Viton: An image-based virtual try-on network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7543–7552.Google ScholarCross Ref
23. Fangzhou Hong, Liang Pan, Zhongang Cai, and Ziwei Liu. 2021. Garment4D: Garment Reconstruction from Point Cloud Sequences. In Advances in Neural Information Processing Systems.Google Scholar
24. Fangzhou Hong, Liang Pan, Zhongang Cai, and Ziwei Liu. 2022. Versatile Multi-Modal Pre-Training for Human-Centric Perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
25. Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, and Tony Tung. 2020. Arch: Animatable reconstruction of clothed humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3093–3102.Google ScholarCross Ref
26. Leslie Ikemoto, Okan Arikan, and David Forsyth. 2009. Generalizing Motion Edits with Gaussian Processes. ACM Trans. Graph. 28, 1, Article 1 (feb 2009), 12 pages. Google ScholarDigital Library
27. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 7 (2013), 1325–1339.Google Scholar
28. Ajay Jain, Ben Mildenhall, Jonathan T Barron, Pieter Abbeel, and Ben Poole. 2021a. Zero-Shot Text-Guided Object Generation with Dream Fields. arXiv preprint arXiv:2112.01455 (2021).Google Scholar
29. Ajay Jain, Matthew Tancik, and Pieter Abbeel. 2021b. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5885–5894.Google ScholarCross Ref
30. Nikolay Jetchev. 2021. ClipMatrix: Text-controlled Creation of 3D Textured Meshes. arXiv preprint arXiv:2109.12922 (2021).Google Scholar
31. Boyi Jiang, Juyong Zhang, Yang Hong, Jinhao Luo, Ligang Liu, and Hujun Bao. 2020. Bcnet: Learning body and cloth shape from a single image. In European Conference on Computer Vision. Springer, 18–35.Google ScholarDigital Library
32. Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, and Ziwei Liu. 2021. Talk-to-Edit: Fine-Grained Facial Editing via Dialog. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13799–13808.Google ScholarCross Ref
33. Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy, and Ziwei Liu. 2022. Text2Human: Text-Driven Controllable Human Image Generation. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–11.Google ScholarDigital Library
34. Kacper Kania, Marek Kowalski, and Tomasz Trzciński. 2021. TrajeVAE-Controllable Human Motion Generation from Trajectories. arXiv preprint arXiv:2104.00351 (2021).Google Scholar
35. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.Google ScholarCross Ref
36. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.Google ScholarCross Ref
37. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
38. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
39. Kathleen M Lewis, Srivatsan Varadharajan, and Ira Kemelmacher-Shlizerman. 2021. Tryongan: Body-aware try-on via layered interpolation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–10.Google ScholarDigital Library
40. Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. 2021. Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13401–13412.Google ScholarCross Ref
41. Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel Van De Panne. 2020. Character controllers using motion vaes. ACM Transactions on Graphics (TOG) 39, 4 (2020), 40–1.Google ScholarDigital Library
42. Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1–16.Google ScholarDigital Library
43. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).Google ScholarDigital Library
44. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1–16.Google ScholarDigital Library
45. William E Lorensen and Harvey E Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. ACM siggraph computer graphics 21, 4 (1987), 163–169.Google Scholar
46. Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. 2019. AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5442–5451.Google ScholarCross Ref
47. Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. 2015. Generating images from captions with attention. arXiv preprint arXiv:1511.02793 (2015).Google Scholar
48. Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV). IEEE, 506–516.Google ScholarCross Ref
49. Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).Google Scholar
50. Marko Mihajlovic, Yan Zhang, Michael J Black, and Siyu Tang. 2021. LEAP: Learning articulated occupancy of people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10461–10471.Google ScholarCross Ref
51. Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision. Springer, 405–421.Google ScholarDigital Library
52. Tomohiko Mukai and Shigeru Kuriyama. 2005. Geostatistical Motion Interpolation. ACM Trans. Graph. 24, 3 (jul 2005), 1062–1070. Google ScholarDigital Library
53. Atsuhiro Noguchi, Xiao Sun, Stephen Lin, and Tatsuya Harada. 2021. Neural articulated radiance field. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5762–5772.Google ScholarCross Ref
54. Atsuhiro Noguchi, Xiao Sun, Stephen Lin, and Tatsuya Harada. 2022. Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations. arXiv preprint arXiv:2204.08839 (2022).Google Scholar
55. Pablo Palafox, Aljaž Božič, Justus Thies, Matthias Nießner, and Angela Dai. 2021. NPMs: Neural Parametric Models for 3D Deformable Shapes. arXiv preprint arXiv:2104.00702 (2021).Google Scholar
56. Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2085–2094.Google ScholarCross Ref
57. Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. 2019. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10975–10985.Google ScholarCross Ref
58. Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021a. Animatable neural radiance fields for human body modeling. arXiv e-prints (2021), arXiv-2105.Google Scholar
59. Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021b. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9054–9063.Google ScholarCross Ref
60. Mathis Petrovich, Michael J Black, and Gül Varol. 2021. Action-Conditioned 3D Human Motion Synthesis with Transformer VAE. arXiv preprint arXiv:2104.05670 (2021).Google Scholar
61. Leonid Pishchulin, Stefanie Wuhrer, Thomas Helten, Christian Theobalt, and Bernt Schiele. 2017. Building Statistical Shape Spaces for 3D Human Modeling. Pattern Recognition (2017).Google Scholar
62. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021).Google Scholar
63. Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021).Google Scholar
64. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International Conference on Machine Learning. PMLR, 1060–1069.Google Scholar
65. Charles Rose, Michael F. Cohen, and Bobby Bodenheimer. 1998. Verbs and Adverbs: Multidimensional Motion Interpolation. IEEE Comput. Graph. Appl. 18, 5 (sep 1998), 32–40. Google ScholarDigital Library
66. Shunsuke Saito, Jinlong Yang, Qianli Ma, and Michael J Black. 2021. SCANimate: Weakly supervised learning of skinned clothed avatar networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2886–2897.Google ScholarCross Ref
67. Aditya Sanghi, Hang Chu, Joseph G Lambourne, Ye Wang, Chin-Yi Cheng, and Marco Fumero. 2021. Clip-forge: Towards zero-shot text-to-shape generation. arXiv preprint arXiv:2110.02624 (2021).Google Scholar
68. Kripasindhu Sarkar, Vladislav Golyanik, Lingjie Liu, and Christian Theobalt. 2021a. Style and pose control for image synthesis of humans from a single monocular view. arXiv preprint arXiv:2102.11263 (2021).Google Scholar
69. Kripasindhu Sarkar, Lingjie Liu, Vladislav Golyanik, and Christian Theobalt. 2021b. HumanGAN: A Generative Model of Humans Images. arXiv preprint arXiv:2103.06902 (2021).Google Scholar
70. Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019a. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2377–2386.Google ScholarCross Ref
71. Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019b. First order motion model for image animation. Advances in Neural Information Processing Systems 32 (2019), 7137–7147.Google Scholar
72. Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. 2022. MotionCLIP: Exposing Human Motion Generation to CLIP Space. arXiv preprint arXiv:2203.08063 (2022).Google Scholar
73. Ayush Tewari, Justus Thies, Ben Mildenhall, Pratul Srinivasan, Edgar Tretschk, Yifan Wang, Christoph Lassner, Vincent Sitzmann, Ricardo Martin-Brualla, Stephen Lombardi, et al. 2021. Advances in neural rendering. arXiv preprint arXiv:2111.05849 (2021).Google Scholar
74. Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2017. Learning from Synthetic Humans. In CVPR.Google Scholar
75. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.Google Scholar
76. Timo von Marcard, Roberto Henschel, Michael Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera. In European Conference on Computer Vision (ECCV).Google ScholarDigital Library
77. Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. 2021a. CLIPNeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields. arXiv preprint arXiv:2112.05139 (2021).Google Scholar
78. Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. 2021b. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. arXiv preprint arXiv:2106.10689 (2021).Google Scholar
79. Chung-Yi Weng, Brian Curless, and Ira Kemelmacher-Shlizerman. 2019. Photo wake-up: 3d character animation from a single photo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5908–5917.Google ScholarCross Ref
80. Jungdam Won and Jehee Lee. 2019. Learning body shape variation in physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–12.Google ScholarDigital Library
81. Mohamed R. Amer Xiao Lin. 2014. Human Motion Modeling using DVGANs. arXiv preprint arXiv:1804.10652 (2014).Google Scholar
82. Hongyi Xu, Thiemo Alldieck, and Cristian Sminchisescu. 2021a. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
83. Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, and Xiang Bai. 2021b. A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model. arXiv preprint arXiv:2112.14757 (2021).Google Scholar
84. Jae Shin Yoon, Lingjie Liu, Vladislav Golyanik, Kripasindhu Sarkar, Hyun Soo Park, and Christian Theobalt. 2021. Pose-guided human animation from a single image in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15039–15048.Google ScholarCross Ref
85. Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, and Wei Wang. 2021. 3D-Aware Semantic-Guided Generative Model for Human Synthesis. arXiv preprint arXiv:2112.01422 (2021).Google Scholar
86. Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, and Lan Xu. 2021. HumanNeRF: Generalizable Neural Human Radiance Field from Sparse Inputs. arXiv preprint arXiv:2112.02789 (2021).Google Scholar
87. Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. 2019. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5745–5753.Google ScholarCross Ref