“A Hybrid Generator Architecture for Controllable Face Synthesis” by Mensah, Kim, Aittala, Laine and Lehtinen

Conference:


Type:


Title:

    A Hybrid Generator Architecture for Controllable Face Synthesis

Session/Category Title: Making Faces With Neural Avatars


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    Modern data-driven image generation models often surpass traditional graphics techniques in quality. However, while traditional modeling and animation tools allow precise control over the image generation process in terms of interpretable quantities — e.g., shapes and reflectances — endowing learned models with such controls is generally difficult. In the context of human faces, we seek a data-driven generator architecture that simultaneously retains the photorealistic quality of modern generative adversarial networks (GAN) and allows explicit, disentangled controls over head shapes, expressions, identity, background, and illumination. While our high-level goal is shared by a large body of previous work, we approach the problem with a different philosophy: We treat the problem as an unconditional synthesis task, and engineer interpretable inductive biases into the model that make it easy for the desired behavior to emerge. Concretely, our generator is a combination of learned neural networks and fixed-function blocks, such as a 3D morphable head model and texture-mapping rasterizer, and we leave it up to the training process to figure out how they should be used together. This greatly simplifies the training problem by removing the need for labeled training data; we learn the distributions of the independent variables that drive the model instead of requiring that their values are known for each training image. Furthermore, we need no contrastive or imitation learning for correct behavior. We show that our design successfully encourages the generative model to make use of the internal, interpretable representations in a semantically meaningful manner. This allows sampling of different aspects of the image independently, as well as precise control of the results by manipulating the internal state of the interpretable blocks within the generator. This enables, for instance, facial animation using traditional animation tools.

References:


    1. Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2017. CVAE-GAN: Fine-grained image generation through asymmetric training. In Proc. ICCV.
    2. Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2018. Towards open-set identity preserving face synthesis. In Proc. CVPR.
    3. Adam Bielski and Paolo Favaro. 2019. Emergence of object segmentation in perturbed generative models. In Proc. NeurIPS.
    4. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH.
    5. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
    6. Marcel C. Bühler, Abhimitra Meka, Gengyan Li, Thabo Beeler, and Otmar Hilliges. 2021. Varitex: Variational neural face textures. In Proc. ICCV. 13890–13899.
    7. Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. 2022. Efficient geometry-aware 3D generative adversarial networks. In Proc. CVPR.
    8. Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. 2019. Learning to predict 3D objects with an interpolation-based differentiable renderer. In Proc. NeurIPS.
    9. Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. In Proc. NeurIPS.
    10. Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and controllable face image generation via 3D imitative-contrastive learning. In Proc. CVPR.
    11. Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. 40, 4 (2021).
    12. Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. 2022. Get3D: A generative model of high quality 3D textured shapes learned from images. arXiv preprint arXiv:2209.11163 (2022).
    13. Baris Gecer, Binod Bhattarai, Josef Kittler, and Tae-Kyun Kim. 2018. Semi-supervised adversarial learning to generate photorealistic face images of new identities from 3D morphable model. In Proc. ECCV.
    14. Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models — an open framework. In Proc. IEEE Int. Conf. on Automatic Face & Gesture Recognition.
    15. Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael J Black, and Timo Bolkart. 2020. GIF: Generative interpretable faces. In Proc. 3DV.
    16. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. In Proc. NeurIPS.
    17. Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering interpretable GAN controls. In Proc. NeurIPS.
    18. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training generative adversarial networks with limited data. In Proc. NeurIPS.
    19. Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. In Proc. NeurIPS.
    20. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proc. CVPR.
    21. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of StyleGAN. In Proc. CVPR.
    22. Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, and Youngjung Uh. 2021. Exploiting spatial dimensions of latent in GAN for real-time image editing. In Proc. CVPR.
    23. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proc. ICLR.
    24. Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proc. ICLR.
    25. Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2019. Improved precision and recall metric for assessing generative models. In Proc. NeurIPS.
    26. Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. 2020. Modular primitives for high-performance differentiable rendering. ACM Trans. Graph. 39, 6 (2020).
    27. Yeonkyeong Lee, Taeho Choi, Hyunsung Go, Hyunjoon Lee, Sunghyun Cho, and Junho Kim. 2022. Exp-GAN: 3D-aware facial image generation with expression control. In Proc. ACCV.
    28. Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (2017).
    29. Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, and Lu Yuan. 2019. Face parsing with RoI tanh-warping. In Proc. CVPR.
    30. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proc. ICCV.
    31. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
    32. Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. 2019. HoloGAN: Unsupervised learning of 3D representations from natural images. In Proc. ICCV.
    33. Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In Proc. IEEE Int. Conf. on Advanced Video and Signal Based Surveillance.
    34. Bui Tuong Phong. 1975. Illumination for computer generated pictures. Commun. ACM 18, 6 (1975).
    35. Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J. Black. 2018. Generating 3D faces using convolutional mesh autoencoders. In Proc. ECCV.
    36. Axel Sauer, Katja Schwarz, and Andreas Geiger. 2022. StyleGAN-XL: Scaling StyleGAN to large diverse datasets. In Proc. SIGGRAPH.
    37. Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, and Xiaoou Tang. 2018. FaceID-GAN: Learning a symmetry three-player GAN for identity-preserving face synthesis. In Proc. CVPR.
    38. Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020. StyleRig: Rigging StyleGAN for 3D control over portrait images. In Proc. CVPR.
    39. Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38, 4 (2019).
    40. Luan Tran, Xi Yin, and Xiaoming Liu. 2017. Disentangled representation learning GAN for pose-invariant face recognition. In Proc. CVPR.
    41. Lance Williams. 1983. Pyramidal Parametrics. Comput. Graph. (proc. SIGGRAPH) 17, 3 (1983).
    42. Yu Yang, Hakan Bilen, Qiran Zou, Wing Yin Cheung, and Xiangyang Ji. 2022. Learning foreground-background segmentation from improved layered GANs. In Proc. WACV.
    43. Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. 2017. Towards large-pose face frontalization in the wild. In Proc. ICCV.
    44. Xiaoming Zhao, Fangchang Ma, David Güera, Zhile Ren, Alexander G. Schwing, and Alex Colburn. 2022. Generative multiplane images: Making a 2D GAN 3D-aware. In Proc. ECCV.


ACM Digital Library Publication:



Overview Page: