“Rewriting geometric rules of a GAN” by Wang, Bau and Zhu

  • ©Sheng-Yu Wang, David Bau, and Jun-Yan Zhu

Conference:


Type:


Title:

    Rewriting geometric rules of a GAN

Presenter(s)/Author(s):



Abstract:


    Deep generative models make visual content creation more accessible to novice users by automating the synthesis of diverse, realistic content based on a collected dataset. However, the current machine learning approaches miss a key element of the creative process – the ability to synthesize things that go far beyond the data distribution and everyday experience. To begin to address this issue, we enable a user to “warp” a given model by editing just a handful of original model outputs with desired geometric changes. Our method applies a low-rank update to a single model layer to reconstruct edited examples. Furthermore, to combat overfitting, we propose a latent space augmentation method based on style-mixing. Our method allows a user to create a model that synthesizes endless objects with defined geometric changes, enabling the creation of a new generative model without the burden of curating a large-scale dataset. We also demonstrate that edited models can be composed to achieve aggregated effects, and we present an interactive interface to enable users to create new models through composition. Empirical measurements on multiple test cases suggest the advantage of our method against recent GAN fine-tuning methods. Finally, we showcase several applications using the edited models, including latent space interpolation and image editing.

References:


    1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN++: How to Edit the Embedded Images?. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    2. Rameen Abdal, Peihao Zhu, Niloy J Mitra, and Peter Wonka. 2021. StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. ACM Transactions on Graphics (TOG) (2021).Google Scholar
    3. Kfr Aberman, Jing Liao, Mingyi Shi, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. 2018. Neural Best-Buddies: Sparse Cross-Domain Correspondence. ACM Transactions on Graphics (TOG) (2018).Google ScholarDigital Library
    4. Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a Matter of Style: Age Transformation Using a Style-Based Regression Model. ACM Transactions on Graphics (TOG) (2021).Google ScholarDigital Library
    5. Badour Albahar, Jingwan Lu, Jimei Yang, Zhixin Shu, Eli Shechtman, and Jia-Bin Huang. 2021. Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN. ACM Transactions on Graphics (TOG) (2021).Google ScholarDigital Library
    6. Marc Alexa, Daniel Cohen-Or, and David Levin. 2000. As-Rigid-As-Possible Shape Interpolation. 157–164.Google Scholar
    7. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. In ACM SIGGRAPH.Google Scholar
    8. Harry G Barrow, Jay M Tenenbaum, Robert C Bolles, and Helen C Wolf. 1977. Parametric correspondence and chamfer matching: Two new techniques for image matching. Technical Report. SRI International Artificial Intelligence Center.Google Scholar
    9. David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, and Antonio Torralba. 2020. Rewriting a deep generative model. In European Conference on Computer Vision (ECCV).Google ScholarDigital Library
    10. Thaddeus Beier and Shawn Neely. 1992. Feature-Based Image Metamorphosis. ACM Transactions on Graphics (TOG) (1992).Google Scholar
    11. Andrew Brock, Jef Donahue, and Karen Simonyan. 2019. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR).Google Scholar
    12. Matthew Brown and David G Lowe. 2007. Automatic Panoramic Image Stitching using Invariant Features. International Journal of Computer Vision (IJCV) (2007).Google ScholarDigital Library
    13. James Cameron and Jon Landau. 2009. Avatar.Google Scholar
    14. Kaidi Cao, Jing Liao, and Lu Yuan. 2018. CariGANs: Unpaired Photo-to-Caricature Translation. ACM Transactions on Graphics (TOG) (2018).Google ScholarDigital Library
    15. Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. 2019. Everybody Dance Now. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
    16. Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, and Seungryong Kim. 2021. CATs: Cost Aggregation Transformers for Visual Correspondence. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    17. Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
    18. Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2017. Density estimation using Real NVP. In International Conference on Learning Representations (ICLR).Google Scholar
    19. Ian Failes. 2016. Masters of FX: Behind the Scenes with Geniuses of Visual and Special Effects. CRC Press.Google Scholar
    20. Martin A Fischler and Robert C Bolles. 1981. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM (1981).Google ScholarDigital Library
    21. Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging frequency analysis for deep fake image recognition. In International Conference on Machine Learning (ICML).Google Scholar
    22. Raghudeep Gadde, Qianli Feng, and Aleix M. Martinez. 2021. Detail Me More: Improving GAN’s photo-realism of complex scenes. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
    23. Rinon Gal, Dana Cohen, Amit Bermano, and Daniel Cohen-Or. 2021. SWAGAN: A Style-based Wavelet-driven Generative Model. ACM Transactions on Graphics (TOG) (2021).Google Scholar
    24. Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, and Daniel Cohen-Or. 2022. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. ACM Transactions on Graphics (TOG) (2022).Google ScholarDigital Library
    25. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    26. Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    27. Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    28. Minyoung Huh, Jun-Yan Zhu Richard Zhang, Sylvain Paris, and Aaron Hertzmann. 2020. Transforming and Projecting Images to Class-conditional Generative Networks. In European Conference on Computer Vision (ECCV).Google ScholarDigital Library
    29. Takeo Igarashi, Tomer Moscovich, and John F Hughes. 2005. As-Rigid-As-Possible Shape Manipulation. ACM Transactions on Graphics (TOG) (2005).Google Scholar
    30. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    31. Wonjong Jang, Gwangjin Ju, Yucheol Jung, Jiaolong Yang, Xin Tong, and Seungyong Lee. 2021. StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation. ACM Transactions on Graphics (TOG) (2021).Google Scholar
    32. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations (ICLR).Google Scholar
    33. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    34. Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-Free Generative Adversarial Networks. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    35. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    36. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).Google ScholarCross Ref
    37. Byungmoon Kim, Daichi Ito, and Gahye Park. 2019. Facial feature liquifying using face mesh. US Patent 10,223,767.Google Scholar
    38. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
    39. Diederik P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1×1 convolutions. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    40. Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. International Conference on Learning Representations (ICLR) (2014).Google Scholar
    41. Nupur Kumari, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2022. Ensembling Of-the-shelf Models for GAN Training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    42. Kathleen M Lewis, Srivatsan Varadharajan, and Ira Kemelmacher-Shlizerman. 2021. TryOnGAN: Body-Aware Try-On via Layered Interpolation. ACM Transactions on Graphics (TOG) (2021).Google ScholarDigital Library
    43. Yijun Li, Richard Zhang, Jingwan Lu, and Eli Shechtman. 2020. Few-shot Image Generation with Elastic Weight Consolidation. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    44. Jing Liao, Yuan Yao, Lu Yuan, Gang Hua, and Sing Bing Kang. 2017. Visual Attribute Transfer Through Deep Image Analogy. ACM Transactions on Graphics (TOG) 36, 4 (July 2017).Google ScholarDigital Library
    45. Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. 2021. EditGAN: High-Precision Semantic Image Editing. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    46. Ce Liu, Jenny Yuen, and Antonio Torralba. 2010. SIFT Flow: Dense Correspondence across Scenes and Its Applications. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2010).Google Scholar
    47. Bruce D Lucas, Takeo Kanade, et al. 1981. An Iterative Image Registration Technique with an Application to Stereo Vision. In International Joint Conference on Artificial Intelligence (IJCAI).Google ScholarDigital Library
    48. George Lucas and Gary Kurtz. 1977. Star Wars.Google Scholar
    49. Sangwoo Mo, Minsu Cho, and Jinwoo Shin. 2020. Freeze the Discriminator: a Simple Baseline for Fine-Tuning GANs. In CVPR Workshop.Google Scholar
    50. Atsuhiro Noguchi and Tatsuya Harada. 2019. Image generation from small datasets via batch statistics adaptation. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
    51. Utkarsh Ojha, Yijun Li, Cynthia Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot Image Generation via Cross-domain Correspondence. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    52. Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    53. Roy Or-El, Soumyadip Sengupta, Ohad Fried, Eli Shechtman, and Ira Kemelmacher-Shlizerman. 2020. Lifespan Age Transformation Synthesis. In European Conference on Computer Vision (ECCV).Google Scholar
    54. Xingang Pan, Xiaohang Zhan, Bo Dai, Dahua Lin, Chen Change Loy, and Ping Luo. 2020. Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation. In European Conference on Computer Vision (ECCV).Google Scholar
    55. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    56. Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021a. Styleclip: Text-driven manipulation of stylegan imagery. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
    57. Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021b. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
    58. Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. Faceshop: Deep Sketch-Based Face Image Editing. ACM Transactions on Graphics (TOG) 37, 4 (2018).Google ScholarDigital Library
    59. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML).Google Scholar
    60. Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. In International Conference on Machine Learning (ICML).Google Scholar
    61. Ali Razavi, Aaron van den Oord, and Oriol Vinyals. 2019. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    62. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021a. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    63. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021b. Encoding in style: a stylegan encoder for image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    64. Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to Detect Manipulated Facial Images. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
    65. Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, David Bau, Antonio Torralba, and Aleksander Madry. 2021. Editing a classifer by rewriting its prediction rules. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    66. Axel Sauer, Kashyap Chitta, Jens Müller, and Andreas Geiger. 2021. Projected GANs Converge Faster. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    67. Scott Schaefer, Travis McPhail, and Joe Warren. 2006. Image Deformation Using Moving Least Squares. ACM Transactions on Graphics (TOG) 25, 3 (2006).Google ScholarDigital Library
    68. Deb Debayan Shi, Yichun and Anil K. Jain. 2019. WarpGAN: Automatic Caricature Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    69. YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Frédo Durand. 2014. Style Transfer for Headshot Portraits. ACM Transactions on Graphics (TOG) (2014).Google Scholar
    70. Yichang Shih, Sylvain Paris, Frédo Durand, and William T Freeman. 2013. Data-driven hallucination of different times of day from a single outdoor photo. ACM Transactions on Graphics (TOG) 32, 6 (2013), 200.Google ScholarDigital Library
    71. X. Soria, E. Riba, and A. Sappa. 2020. Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection. In Winter Conference on Applications of Computer Vision.Google Scholar
    72. Diana Sungatullina, Egor Zakharov, Dmitry Ulyanov, and Victor S. Lempitsky. 2021. Image Manipulation with Perceptual Discriminators. In European Conference on Computer Vision (ECCV).Google Scholar
    73. Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, and Nenghai Yu. 2020. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. arXiv preprint arXiv:2010.16417 (2020).Google Scholar
    74. Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. ACM Transactions on Graphics (TOG) (2021).Google Scholar
    75. Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Trung-Kien Nguyen, and Ngai-Man Cheung. 2020. Towards good practices for data augmentation in gan training. arXiv preprint arXiv:2006.05338 2 (2020).Google Scholar
    76. Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, and Weilong Yang. 2021. Regularing Generative Adversarial Networks under Limited Data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    77. Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch Your Own GAN. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
    78. Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. 2020b. CNN-generated images are surprisingly easy to spot… for now. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    79. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. Video-to-Video Synthesis. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    80. Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020a. MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    81. Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez-Garcia, and Bogdan Raducanu. 2018b. Transferring gans: generating images from limited data. In European Conference on Computer Vision (ECCV).Google Scholar
    82. Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing (TIP) 13, 4 (2004), 600–612.Google ScholarDigital Library
    83. Simon N Wood. 2003. Thin Plate Regression Splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2003).Google Scholar
    84. Fisher Yu, Ari Sef, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google Scholar
    85. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    86. Xu Zhang, Svebor Karaman, and Shih-Fu Chang. 2019. Detecting and Simulating Artifacts in GAN Fake Images. In IEEE International Workshop on Information Forensics and Security (WIFS).Google Scholar
    87. Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, and Sanja Fidler. 2021. Image {GAN}s meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering. In International Conference on Learning Representations (ICLR).Google Scholar
    88. Miaoyun Zhao, Yulai Cong, and Lawrence Carin. 2020a. On leveraging pretrained GANs for generation with limited data. In International Conference on Machine Learning (ICML).Google Scholar
    89. Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020b. Differentiable Augmentation for Data-Efficient GAN Training. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
    90. Zhengli Zhao, Zizhao Zhang, Ting Chen, Sameer Singh, and Han Zhang. 2020c. Image augmentations for GAN training. arXiv preprint arXiv:2006.02595 (2020).Google Scholar
    91. Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2017).Google Scholar
    92. Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision (ECCV).Google ScholarCross Ref


ACM Digital Library Publication:



Overview Page: