SWAGAN: a style-based wavelet-driven generative model

Rinon Gal; Dana Cohen Hochberg; Amit Haim Bermano; Daniel Cohen-Or

“SWAGAN: a style-based wavelet-driven generative model” by Gal, Hochberg, Bermano and Cohen-Or

Next: “SwarmVision: Autonomous Aesthetic Multi-Camera... »

« Previous: “SVG and SMIL for Interactive Multimedia...

Conference:

SIGGRAPH 2021

Type(s):

Technical Papers

Title:

SWAGAN: a style-based wavelet-driven generative model

Presenter(s)/Author(s):

Rinon Gal

Dana Cohen Hochberg

Amit Haim Bermano

Daniel Cohen-Or

Abstract:

In recent years, considerable progress has been made in the visual quality of Generative Adversarial Networks (GANs). Even so, these networks still suffer from degradation in quality for high-frequency content, stemming from a spectrally biased architecture, and similarly unfavorable loss functions. To address this issue, we present a novel general-purpose Style and WAvelet based GAN (SWAGAN) that implements progressive generation in the frequency domain. SWAGAN incorporates wavelets throughout its generator and discriminator architectures, enforcing a frequency-aware latent representation at every step of the way. This approach, designed to directly tackle the spectral bias of neural networks, yields an improvement in the ability to generate medium and high frequency content, including structures which other networks fail to learn. We demonstrate the advantage of our method by integrating it into the SyleGAN2 framework, and verifying that content generation in the wavelet domain leads to more realistic high-frequency content, even when trained for fewer iterations. Furthermore, we verify that our model’s latent space retains the qualities that allow StyleGAN to serve as a basis for a multitude of editing tasks, and show that our frequency-aware approach also induces improved high-frequency performance in downstream tasks.

References:

1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432–4441.Google ScholarCross Ref
2. Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems. 2172–2180.Google ScholarDigital Library
3. Yuanqi Chen, Ge Li, Cece Jin, Shan Liu, and Thomas Li. 2020. SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains. arXiv preprint arXiv:2012.05535 (2020).Google Scholar
4. Ingrid Daubechies. 1990. The wavelet transform, time-frequency localization and signal analysis. IEEE transactions on information theory 36, 5 (1990), 961–1005.Google ScholarDigital Library
5. Ingrid Daubechies. 1992. Ten lectures on wavelets. SIAM.Google Scholar
6. Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. arXiv preprint arXiv:1506.05751 (2015).Google Scholar
7. Ricard Durall, Margret Keuper, and Janis Keuper. 2020. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7890–7899.Google ScholarCross Ref
8. Tarik Dzanic, Karan Shah, and Freddie Witherden. 2019. Fourier Spectrum Discrepancies in Deep Network Generated Images. arXiv preprint arXiv:1911.06465 (2019).Google Scholar
9. Xing Gao and Hongkai Xiong. 2016. A hybrid wavelet convolution network with sparse-coding for image super-resolution. In 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 1439–1443.Google ScholarCross Ref
10. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdfGoogle ScholarDigital Library
11. Huaibo Huang, Ran He, Zhenan Sun, and Tieniu Tan. 2017. Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution. In Proceedings of the IEEE International Conference on Computer Vision. 1689–1697.Google ScholarCross Ref
12. Huaibo Huang, Ran He, Zhenan Sun, and Tieniu Tan. 2019. Wavelet domain generative adversarial network for multi-scale face hallucination. International Journal of Computer Vision 127, 6-7 (2019), 763–784.Google ScholarDigital Library
13. Eunhee Kang, Junhong Min, and Jong Chul Ye. 2017. A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction. Medical physics 44, 10 (2017), e360–e375.Google Scholar
14. Animesh Karnewar and Oliver Wang. 2019. MSG-GAN: multi-scale gradient GAN for stable image synthesis. arXiv preprint arXiv:1903.06048 (2019).Google Scholar
15. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google Scholar
16. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.Google Scholar
17. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4401–4410.Google ScholarCross Ref
18. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.Google ScholarCross Ref
19. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681–4690.Google ScholarCross Ref
20. Lin Liu, Jianzhuang Liu, Shanxin Yuan, Gregory Slabaugh, Ales Leonardis, Wengang Zhou, and Qi Tian. 2020. Wavelet-Based Dual-Branch Network for Image Demoiréing. arXiv preprint arXiv:2007.07173 (2020).Google Scholar
21. Pengju Liu, Hongzhi Zhang, Wei Lian, and Wangmeng Zuo. 2019b. Multi-level wavelet convolutional neural networks. IEEE Access 7 (2019), 74973–74985.Google ScholarCross Ref
22. Yunfan Liu, Qi Li, and Zhenan Sun. 2019a. Attribute-aware face aging with wavelet-based generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11877–11886.Google ScholarCross Ref
23. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2018. Large-scale celebfacesattributes (celeba) dataset. (2018).Google Scholar
24. Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. arXiv preprint arXiv:2003.08934 (2020).Google Scholar
25. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
26. Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. 2019. Hologan: Unsupervised learning of 3d representations from natural images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7588–7597.Google Scholar
27. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google Scholar
28. Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. 2019. On the spectral bias of neural networks. In International Conference on Machine Learning. PMLR, 5301–5310.Google Scholar
29. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020).Google Scholar
30. Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In CVPR.Google Scholar
31. Gage Skidmore. 2016. Gal Gadot image by Gage Skidmore [CC BY-SA 3.0], via Wikimedia Commons – https://commons.wikimedia.org/w/index.php?curid=50402815. (2016).Google Scholar
32. Matthew Tancik, Pratul P Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T Barron, and Ren Ng. 2020. Fourier features let networks learn high frequency functions in low dimensional domains. arXiv preprint arXiv:2006.10739 (2020).Google Scholar
33. Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–12.Google ScholarDigital Library
34. Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. arXiv preprint arXiv:2102.02766 (2021).Google Scholar
35. Jianyi Wang, Xin Deng, Mai Xu, Congyong Chen, and Yuhang Song. 2020. Multi-level Wavelet-based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video. arXiv preprint arXiv:2008.00499 (2020).Google Scholar
36. Travis Williams and Robert Li. 2018. Wavelet pooling for convolutional neural networks. In International Conference on Learning Representations.Google Scholar
37. Jaejun Yoo, Youngjung Uh, Sanghyuk Chun, Byeongkyu Kang, and Jung-Woo Ha. 2019. Photorealistic style transfer via wavelet transforms. In Proceedings of the IEEE International Conference on Computer Vision. 9036–9045.Google ScholarCross Ref
38. Qi Zhang, Huafeng Wang, Tao Du, Sichen Yang, Yuehai Wang, Zhiqiang Xing, Wenle Bai, and Yang Yi. 2019b. Super-resolution reconstruction algorithms based on fusion of deep learning mechanism and wavelet. In Proceedings of the 2nd International Conference on Artificial Intelligence and Pattern Recognition. 102–107.Google ScholarDigital Library
39. Qi Zhang, Huafeng Wang, and Sichen Yang. 2019a. Image Super-Resolution Using a Wavelet-based Generative Adversarial Network. arXiv preprint arXiv:1907.10213 (2019).Google Scholar
40. Zhisheng Zhong, Tiancheng Shen, Yibo Yang, Zhouchen Lin, and Chao Zhang. 2018. Joint sub-bands learning with clique structures for wavelet domain super-resolution. Advances in Neural Information Processing Systems 31 (2018), 165–175.Google Scholar

ACM Digital Library Publication: