“TryOnGAN: body-aware try-on via layered interpolation” by Lewis, Varadharajan and Kemelmacher-Shlizerman

  • ©Kathleen M. Lewis, Srivatsan Varadharajan, and Ira Kemelmacher-Shlizerman




    TryOnGAN: body-aware try-on via layered interpolation



    Given a pair of images—target person and garment on another person—we automatically generate the target person in the given garment. Previous methods mostly focused on texture transfer via paired data training, while overlooking body shape deformations, skin color, and seamless blending of garment with the person. This work focuses on those three components, while also not requiring paired data training. We designed a pose conditioned StyleGAN2 architecture with a clothing segmentation branch that is trained on images of people wearing garments. Once trained, we propose a new layered latent space interpolation method that allows us to preserve and synthesize skin color and target body shape while transferring the garment from a different person. We demonstrate results on high resolution 512 × 512 images, and extensively compare to state of the art in try-on on both latent space generated and real images.


    1. Martin Abadi et al. 2016. Tensorflow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation (2016).Google Scholar
    2. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN++: How to Edit the Embedded Images? CVPR (2020).Google Scholar
    3. Yazeed Alharbi and Peter Wonka. 2020. Disentangled Image Generation Through Structured Noise Injection. CVPR (2020).Google Scholar
    4. Serge Belongie, Jitendra Malik, and Jan Puzicha. 2002. Shape matching and object recognition using shape contexts. IEEE transactions on pattern analysis and machine intelligence 24.4 (2002), 509–522.Google ScholarDigital Library
    5. Arunava Chakraborty et al. 2020. S2cGAN: Semi-Supervised Training of Conditional GANs with Fewer Labels. arXiv e-prints (2020).Google Scholar
    6. Chao-Te Chou et al. 2018. Pivtons: Pose invariant virtual try-on shoe with conditional image completion. Asian Conference on Computer Vision (2018).Google Scholar
    7. Edo Collins et al. 2020. Editing in Style: Uncovering the Local Semantics of GANs. IEEE Conf. Comput. Vis. Pattern Recog. (2020).Google Scholar
    8. Haoye Dong et al. 2019. Towards multi-pose guided virtual try-on network. Proceedings of the IEEE International Conference on Computer Vision (2019).Google Scholar
    9. Garoe Dorta et al. 2020. The GAN that warped: Semantic attribute editing with unpaired data. CVPR (2020).Google Scholar
    10. Vincent Dumoulin et al. 2018. Feature-wise transformations. Distill (2018).Google Scholar
    11. Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2016. A learned representation for artistic style. CoRR (2016).Google Scholar
    12. Golnaz Ghiasi et al. 2017. Exploring the structure of a real-time, arbitrary neural artistic stylization network. CoRR (2017).Google Scholar
    13. Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, and Liang Lin. 2019. Graphonomy: Universal Human Parsing via Graph Transfer Learning. CVPR (2019).Google Scholar
    14. Ian Goodfellow et al. 2014. Generative adversarial nets. In Advances in neural information processing systems (2014).Google Scholar
    15. Xintong Han et al. 2018. Viton: An image-based virtual try-on network. IEEE Conf. Comput. Vis. Pattern Recog. (2018).Google Scholar
    16. Martin Heusel et al. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems (2017).Google Scholar
    17. Jialu Huang, Jing Liao, and Sam Kwong. 2020. Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2 Network. arXiv preprint arXiv:2010.05713 (2020).Google Scholar
    18. Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. CoRR (2017).Google Scholar
    19. Thibaut Issenhuth, Jérémie Mary, and Clément Calauzènes. 2019. End-to-End Learning of Geometric Deformations of Feature Maps for Virtual Try-On. arXiv preprint arXiv:1906.01347 (2019).Google Scholar
    20. Surgan Jandial et al. 2020. SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On. The IEEE Winter Conference on Applications of Computer Vision (2020).Google Scholar
    21. Tero Karras et al. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google Scholar
    22. Tero Karras et al. 2019. A style-based generator architecture for generative adversarial network. CVPR (2019).Google Scholar
    23. Tero Karras et al. 2020. Analyzing and improving the image quality of stylegan. CVPR (2020).Google Scholar
    24. M. Hadi Kiapour, Svetlana Lazebnik Xufeng Han, Alexander C. Berg, and Tamara L. Berg. 2015. Where to Buy It:Matching Street Clothing Photos in Online Shops. International Conference on Computer Vision (2015).Google Scholar
    25. Wen Liu et al. 2020. Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis. arXiv preprint arXiv:2011.09055 (2020).Google Scholar
    26. Yifang Men et al. 2020. Controllable person image synthesis with attribute-decomposed gan. CVPR (2020).Google Scholar
    27. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
    28. Assaf Neuberger et al. 2020. Image Based Virtual Try-on Network from Unpaired Data. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020).Google Scholar
    29. Justin NM Pinkney and Doron Adler. 2020. Resolution Dependant GAN Interpolation for Controllable Image Synthesis Between Domains. arXiv preprint arXiv:2010.05334 (2020).Google Scholar
    30. Amir Raffiee and Michael Sollami. 2020. GarmentGAN: Photo-realistic Adversarial Fashion Transfer. arXiv preprint arXiv:2003.01894 (2020).Google Scholar
    31. Amit Raj et al. 2018. Swapnet: Garment transfer in single view images. ECCV (2018).Google Scholar
    32. Elad Richardson et al. 2020. Encoding in style: a stylegan encoder for image-to-image translation. arXiv preprint arXiv:2008.00951. (2020).Google Scholar
    33. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
    34. Yang Song et al. 2017. Learning unified embedding for apparel recognition. Proceedings of the IEEE International Conference on Computer Vision Workshops (2017).Google Scholar
    35. Bochao Wang et al. 2018a. Toward characteristic-preserving image-based virtual try-on network. ECCV (2018).Google Scholar
    36. Ting-Chun Wang et al. 2018b. High-resolution image synthesis and semantic manipulation with conditional gans. CVPR (2018).Google Scholar
    37. Zhonghua Wu et al. 2019. M2e-try on net: Fashion from model to everyone. Proceedings of the 27th ACM International Conference on Multimedia (2019).Google Scholar
    38. Han Yang et al. 2020. Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content. CVPR (2020).Google Scholar
    39. Gokhan Yildirim et al. 2019. Generating high-resolution fashion model images wearing custom outfits. Proceedings of the IEEE International Conference on Computer Vision Workshops (2019).Google Scholar
    40. Mihai Zanfir et al. 2018. Human appearance transfer. IEEE Conf. Comput. Vis. Pattern Recog. (2018).Google Scholar
    41. Richard Zhang et al. 2018. The unreasonable effectiveness of deep features as a perceptual metric. CVPR (2018).Google Scholar
    42. Jiapeng Zhu et al. 2020. In-domain gan inversion for real image editing. ECCV (2020).Google Scholar

ACM Digital Library Publication:

Overview Page: