Rewriting geometric rules of a GAN

Sheng-Yu Wang; David Bau; Jun-Yan Zhu

“Rewriting geometric rules of a GAN” by Wang, Bau and Zhu

Next: “Reyes using DirectX 11” by Tatarinov »

« Previous: “Revving up a storm: a talk on creating Jackson...

Conference:

SIGGRAPH 2022

Type(s):

Technical Papers

Title:

Rewriting geometric rules of a GAN

Presenter(s)/Author(s):

Sheng-Yu Wang

David Bau

Jun-Yan Zhu

Abstract:

Deep generative models make visual content creation more accessible to novice users by automating the synthesis of diverse, realistic content based on a collected dataset. However, the current machine learning approaches miss a key element of the creative process – the ability to synthesize things that go far beyond the data distribution and everyday experience. To begin to address this issue, we enable a user to “warp” a given model by editing just a handful of original model outputs with desired geometric changes. Our method applies a low-rank update to a single model layer to reconstruct edited examples. Furthermore, to combat overfitting, we propose a latent space augmentation method based on style-mixing. Our method allows a user to create a model that synthesizes endless objects with defined geometric changes, enabling the creation of a new generative model without the burden of curating a large-scale dataset. We also demonstrate that edited models can be composed to achieve aggregated effects, and we present an interactive interface to enable users to create new models through composition. Empirical measurements on multiple test cases suggest the advantage of our method against recent GAN fine-tuning methods. Finally, we showcase several applications using the edited models, including latent space interpolation and image editing.

References:

1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN++: How to Edit the Embedded Images?. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
2. Rameen Abdal, Peihao Zhu, Niloy J Mitra, and Peter Wonka. 2021. StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. ACM Transactions on Graphics (TOG) (2021).Google Scholar
3. Kfr Aberman, Jing Liao, Mingyi Shi, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. 2018. Neural Best-Buddies: Sparse Cross-Domain Correspondence. ACM Transactions on Graphics (TOG) (2018).Google ScholarDigital Library
4. Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a Matter of Style: Age Transformation Using a Style-Based Regression Model. ACM Transactions on Graphics (TOG) (2021).Google ScholarDigital Library
5. Badour Albahar, Jingwan Lu, Jimei Yang, Zhixin Shu, Eli Shechtman, and Jia-Bin Huang. 2021. Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN. ACM Transactions on Graphics (TOG) (2021).Google ScholarDigital Library
6. Marc Alexa, Daniel Cohen-Or, and David Levin. 2000. As-Rigid-As-Possible Shape Interpolation. 157–164.Google Scholar
7. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. In ACM SIGGRAPH.Google Scholar
8. Harry G Barrow, Jay M Tenenbaum, Robert C Bolles, and Helen C Wolf. 1977. Parametric correspondence and chamfer matching: Two new techniques for image matching. Technical Report. SRI International Artificial Intelligence Center.Google Scholar
9. David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, and Antonio Torralba. 2020. Rewriting a deep generative model. In European Conference on Computer Vision (ECCV).Google ScholarDigital Library
10. Thaddeus Beier and Shawn Neely. 1992. Feature-Based Image Metamorphosis. ACM Transactions on Graphics (TOG) (1992).Google Scholar
11. Andrew Brock, Jef Donahue, and Karen Simonyan. 2019. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR).Google Scholar
12. Matthew Brown and David G Lowe. 2007. Automatic Panoramic Image Stitching using Invariant Features. International Journal of Computer Vision (IJCV) (2007).Google ScholarDigital Library
13. James Cameron and Jon Landau. 2009. Avatar.Google Scholar
14. Kaidi Cao, Jing Liao, and Lu Yuan. 2018. CariGANs: Unpaired Photo-to-Caricature Translation. ACM Transactions on Graphics (TOG) (2018).Google ScholarDigital Library
15. Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. 2019. Everybody Dance Now. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
16. Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, and Seungryong Kim. 2021. CATs: Cost Aggregation Transformers for Visual Correspondence. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
17. Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
18. Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2017. Density estimation using Real NVP. In International Conference on Learning Representations (ICLR).Google Scholar
19. Ian Failes. 2016. Masters of FX: Behind the Scenes with Geniuses of Visual and Special Effects. CRC Press.Google Scholar
20. Martin A Fischler and Robert C Bolles. 1981. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM (1981).Google ScholarDigital Library
21. Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging frequency analysis for deep fake image recognition. In International Conference on Machine Learning (ICML).Google Scholar
22. Raghudeep Gadde, Qianli Feng, and Aleix M. Martinez. 2021. Detail Me More: Improving GAN’s photo-realism of complex scenes. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
23. Rinon Gal, Dana Cohen, Amit Bermano, and Daniel Cohen-Or. 2021. SWAGAN: A Style-based Wavelet-driven Generative Model. ACM Transactions on Graphics (TOG) (2021).Google Scholar
24. Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, and Daniel Cohen-Or. 2022. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. ACM Transactions on Graphics (TOG) (2022).Google ScholarDigital Library
25. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
26. Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
27. Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
28. Minyoung Huh, Jun-Yan Zhu Richard Zhang, Sylvain Paris, and Aaron Hertzmann. 2020. Transforming and Projecting Images to Class-conditional Generative Networks. In European Conference on Computer Vision (ECCV).Google ScholarDigital Library
29. Takeo Igarashi, Tomer Moscovich, and John F Hughes. 2005. As-Rigid-As-Possible Shape Manipulation. ACM Transactions on Graphics (TOG) (2005).Google Scholar
30. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
31. Wonjong Jang, Gwangjin Ju, Yucheol Jung, Jiaolong Yang, Xin Tong, and Seungyong Lee. 2021. StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation. ACM Transactions on Graphics (TOG) (2021).Google Scholar
32. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations (ICLR).Google Scholar
33. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
34. Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
35. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
36. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).Google ScholarCross Ref
37. Byungmoon Kim, Daichi Ito, and Gahye Park. 2019. Facial feature liquifying using face mesh. US Patent 10,223,767.Google Scholar
38. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
39. Diederik P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1×1 convolutions. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
40. Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. International Conference on Learning Representations (ICLR) (2014).Google Scholar
41. Nupur Kumari, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2022. Ensembling Of-the-shelf Models for GAN Training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
42. Kathleen M Lewis, Srivatsan Varadharajan, and Ira Kemelmacher-Shlizerman. 2021. TryOnGAN: Body-Aware Try-On via Layered Interpolation. ACM Transactions on Graphics (TOG) (2021).Google ScholarDigital Library
43. Yijun Li, Richard Zhang, Jingwan Lu, and Eli Shechtman. 2020. Few-shot Image Generation with Elastic Weight Consolidation. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
44. Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, and Sing Bing Kang. 2017. Visual Attribute Transfer Through Deep Image Analogy. ACM Transactions on Graphics (TOG) 36, 4 (July 2017).Google ScholarDigital Library
45. Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. 2021. EditGAN: High-Precision Semantic Image Editing. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
46. Ce Liu, Jenny Yuen, and Antonio Torralba. 2010. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2010).Google Scholar
47. Bruce D Lucas, Takeo Kanade, et al. 1981. An Iterative Image Registration Technique with an Application to Stereo Vision. In International Joint Conference on Artificial Intelligence (IJCAI).Google ScholarDigital Library
48. George Lucas and Gary Kurtz. 1977. Star Wars.Google Scholar
49. Sangwoo Mo, Minsu Cho, and Jinwoo Shin. 2020. Freeze the Discriminator: a Simple Baseline for Fine-Tuning GANs. In CVPR Workshop.Google Scholar
50. Atsuhiro Noguchi and Tatsuya Harada. 2019. Image generation from small datasets via batch statistics adaptation. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
51. Utkarsh Ojha, Yijun Li, Cynthia Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot Image Generation via Cross-domain Correspondence. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
52. Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
53. Roy Or-El, Soumyadip Sengupta, Ohad Fried, Eli Shechtman, and Ira Kemelmacher-Shlizerman. 2020. Lifespan Age Transformation Synthesis. In European Conference on Computer Vision (ECCV).Google Scholar
54. Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2020. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. In European Conference on Computer Vision (ECCV).Google Scholar
55. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
56. Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021a. Styleclip: Text-driven manipulation of stylegan imagery. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
57. Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021b. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
58. Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. Faceshop: Deep Sketch-Based Face Image Editing. ACM Transactions on Graphics (TOG) 37, 4 (2018).Google ScholarDigital Library
59. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML).Google Scholar
60. Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. In International Conference on Machine Learning (ICML).Google Scholar
61. Ali Razavi, Aaron van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
62. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021a. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
63. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021b. Encoding in style: a stylegan encoder for image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
64. Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated Facial Images. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
65. Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, and Aleksander Madry. 2021. Editing a classifer by rewriting its prediction rules. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
66. Axel Sauer, Kashyap Chitta, Jens Müller, and Andreas Geiger. 2021. Projected GANs Converge Faster. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
67. Scott Schaefer, Travis McPhail, and Joe Warren. 2006. Image Deformation Using Moving Least Squares. ACM Transactions on Graphics (TOG) 25, 3 (2006).Google ScholarDigital Library
68. Deb Debayan Shi, Yichun and Anil K. Jain. 2019. WarpGAN: Automatic Caricature Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
69. YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Frédo Durand. 2014. Style Transfer for Headshot Portraits. ACM Transactions on Graphics (TOG) (2014).Google Scholar
70. Yichang Shih, Sylvain Paris, Frédo Durand, and William T Freeman. 2013. Data-driven hallucination of different times of day from a single outdoor photo. ACM Transactions on Graphics (TOG) 32, 6 (2013), 200.Google ScholarDigital Library
71. X. Soria, E. Riba, and A. Sappa. 2020. Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In Winter Conference on Applications of Computer Vision.Google Scholar
72. Diana Sungatullina, Egor Zakharov, Dmitry Ulyanov, and Victor S. Lempitsky. 2021. Image Manipulation with Perceptual Discriminators. In European Conference on Computer Vision (ECCV).Google Scholar
73. Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, and Nenghai Yu. 2020. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. arXiv preprint arXiv:2010.16417 (2020).Google Scholar
74. Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. ACM Transactions on Graphics (TOG) (2021).Google Scholar
75. Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Trung-Kien Nguyen, and Ngai-Man Cheung. 2020. Towards good practices for data augmentation in gan training. arXiv preprint arXiv:2006.05338 2 (2020).Google Scholar
76. Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, and Weilong Yang. 2021. Regularing Generative Adversarial Networks under Limited Data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
77. Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch Your Own GAN. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
78. Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. 2020b. CNN-generated images are surprisingly easy to spot… for now. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
79. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
80. Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020a. MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
81. Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez-Garcia, and Bogdan Raducanu. 2018b. Transferring gans: generating images from limited data. In European Conference on Computer Vision (ECCV).Google Scholar
82. Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing (TIP) 13, 4 (2004), 600–612.Google ScholarDigital Library
83. Simon N Wood. 2003. Thin Plate Regression Splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2003).Google Scholar
84. Fisher Yu, Ari Sef, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google Scholar
85. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
86. Xu Zhang, Svebor Karaman, and Shih-Fu Chang. 2019. Detecting and Simulating Artifacts in GAN Fake Images. In IEEE International Workshop on Information Forensics and Security (WIFS).Google Scholar
87. Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, and Sanja Fidler. 2021. Image {GAN}s meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering. In International Conference on Learning Representations (ICLR).Google Scholar
88. Miaoyun Zhao, Yulai Cong, and Lawrence Carin. 2020a. On leveraging pretrained GANs for generation with limited data. In International Conference on Machine Learning (ICML).Google Scholar
89. Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020b. Differentiable Augmentation for Data-Efficient GAN Training. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
90. Zhengli Zhao, Zizhao Zhang, Ting Chen, Sameer Singh, and Han Zhang. 2020c. Image augmentations for GAN training. arXiv preprint arXiv:2006.02595 (2020).Google Scholar
91. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2017).Google Scholar
92. Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision (ECCV).Google ScholarCross Ref

ACM Digital Library Publication: