Unsupervised K-modal styled content generation

Omry Sendik; Daniel (Dani) Lischinski; Daniel Cohen-Or

“Unsupervised K-modal styled content generation” by Sendik, Lischinski and Cohen-Or

Next: “Unsupervised learning for cuboid shape... »

« Previous: “Unsupervised Incremental Learning for Hand...

Conference:

SIGGRAPH 2020

Type(s):

Technical Papers

Title:

Unsupervised K-modal styled content generation

Session/Category Title: Artistic Imaging

Presenter(s)/Author(s):

Omry Sendik

Daniel (Dani) Lischinski

Daniel Cohen-Or

Abstract:

The emergence of deep generative models has recently enabled the automatic generation of massive amounts of graphical content, both in 2D and in 3D. Generative Adversarial Networks (GANs) and style control mechanisms, such as Adaptive Instance Normalization (AdaIN), have proved particularly effective in this context, culminating in the state-of-the-art StyleGAN architecture. While such models are able to learn diverse distributions, provided a sufficiently large training set, they are not well-suited for scenarios where the distribution of the training data exhibits a multi-modal behavior. In such cases, reshaping a uniform or normal distribution over the latent space into a complex multi-modal distribution in the data domain is challenging, and the generator might fail to sample the target distribution well. Furthermore, existing unsupervised generative models are not able to control the mode of the generated samples independently of the other visual attributes, despite the fact that they are typically disentangled in the training data.In this paper, we introduce uMM-GAN, a novel architecture designed to better model multi-modal distributions, in an unsupervised fashion. Building upon the StyleGAN architecture, our network learns multiple modes, in a completely unsupervised manner, and combines them using a set of learned weights. We demonstrate that this approach is capable of effectively approximating a complex distribution as a superposition of multiple simple ones. We further show that uMM-GAN effectively disentangles between modes and style, thereby providing an independent degree of control over the generated content.

References:

1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to embed images into the StyleGAN latent space?. In Proc. ICCV. 4432–4441.Google ScholarCross Ref
2. Matan Ben-Yosef and Daphna Weinshall. 2018. Gaussian mixture generative adversarial networks for diverse datasets, and the unsupervised clustering of images. arXiv preprint arXiv:1808.10356 (2018).Google Scholar
3. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR (2019).Google Scholar
4. Nat Dilokthanakul, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran, and Murray Shanahan. 2016. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. CoRR abs/1611.02648 (2016). arXiv:1611.02648 http://arxiv.org/abs/1611.02648Google Scholar
5. Aviv Gabbay and Yedid Hoshen. 2019. Style generator inversion for image enhancement and animation. arXiv:1906.11880 (2019).Google Scholar
6. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems. 2672–2680.Google Scholar
7. Swaminathan Gurumurthy, Ravi Kiran Sarvadevabhatla, and R Venkatesh Babu. 2017. DeLiGAN: Generative adversarial networks for diverse and limited data. In Proc. IEEE CVPR. 166–174.Google ScholarCross Ref
8. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems. 6626–6637.Google Scholar
9. Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proc. ICCV. 1501–1510.Google ScholarCross Ref
10. Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proc. ECCV. 172–189.Google ScholarCross Ref
11. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google Scholar
12. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proc. CVPR. 4401–4410.Google ScholarCross Ref
13. Mahyar Khayatkhoei, Maneesh K. Singh, and Ahmed Elgammal. 2018. Disconnected Manifold Learning for Generative Adversarial Networks. In Advances in Neural Information Processing Systems. Vol. 31. 7354–7364.Google Scholar
14. Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1×1 convolutions. In Advances in Neural Information Processing Systems. 10215–10224.Google Scholar
15. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. http://arxiv.org/abs/1312.6114Google Scholar
16. Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13). Sydney, Australia.Google ScholarDigital Library
17. Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proc. ECCV. 35–51.Google ScholarCross Ref
18. Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Kumar Singh, and Ming-Hsuan Yang. 2019. DRIT++: Diverse image-to-image translation via disentangled representations. arXiv preprint arXiv:1905.01270 (2019).Google Scholar
19. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
20. Sudipto Mukherjee, Himanshu Asnani, Eugene Lin, and Sreeram Kannan. 2019. ClusterGAN: Latent space clustering in generative adversarial networks. In Proc. AAAI Conference on Artificial Intelligence, Vol. 33. 4610–4617.Google ScholarDigital Library
21. Augustus Odena, Christopher Olah, and Jonathon Shlens. 2016. Conditional Image Synthesis With Auxiliary Classifier GANs. (2016). arXiv:stat.ML/1610.09585Google Scholar
22. Teodora Pandeva and Matthias Schubert. 2019. MMGAN: Generative Adversarial Networks for Multi-Modal Distributions. (2019). arXiv:cs.LG/1911.06663Google Scholar
23. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv.org (Nov. 2015). arXiv:cs.LG/1511.06434v2Google Scholar
24. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252.Google ScholarDigital Library
25. Kim Seonghyeon. 2020. Style-Based GAN in PyTorch. https://github.com/rosinality/style-based-gan-pytorch/. [Online; accessed 9-April-2020].Google Scholar
26. Chang Xiao, Peilin Zhong, and Changxi Zheng. 2018. BourGAN: Generative networks with metric embeddings. In Advances in Neural Information Processing Systems, Vol. 32. 2269–2280.Google Scholar
27. Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google Scholar
28. Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017. Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, Vol. 31. 465–476.Google Scholar

ACM Digital Library Publication: