“Word-As-Image for Semantic Typography” by Iluz, Vinker, Hertz, Berio, Cohen-Or, et al. …

  • ©Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir




    Word-As-Image for Semantic Typography

Session/Category Title: Neural Image Generation and Editing




    A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques. Code and demo will be available at our project page.


    1. Tomer Amit, Tal Shaharbany, Eliya Nachmani, and Lior Wolf. 2021. SegDiff: Image Segmentation with Diffusion Probabilistic Models.
    2. Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208–18218.
    3. Samaneh Azadi, Matthew Fisher, Vladimir G. Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. 2018. Multi-Content GAN for Few-Shot Font Style Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Salt Lake City, UT, USA, 7564–7573.
    4. Elena Balashova, Amit H. Bermano, Vladimir G. Kim, Stephen DiVerdi, Aaron Hertzmann, and Thomas Funkhouser. 2019. Learning a Stroke-Based Representation for Fonts. Computer Graphics Forum 38, 1 (2019), 429–442.
    5. Brad Barber and Hannu Huhdanpaa. 1995. QHull. The Geometry Center, University of Minnesota, http://www.geom.umn.edu/software/qhull (1995).
    6. Daniel Berio, Frederic Fol Leymarie, Paul Asente, and Jose Echevarria. 2022. StrokeStyles: Stroke-Based Segmentation and Stylization of Fonts. ACM Trans. Graph. 41, 3, Article 28 (apr 2022), 21 pages.
    7. Neill DF Campbell and Jan Kautz. 2014. Learning a Manifold of Fonts. ACM Transactions on Graphics (TOG) 33, 4 (2014). Article no. 91.
    8. Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer Interpretability Beyond Attention Visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 782–791.
    9. Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. 2021. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. CoRR abs/2108.02938 (2021). arXiv:2108.02938 https://arxiv.org/abs/2108.02938
    10. Boris Delaunay et al. 1934. Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 7, 793–800 (1934), 1–2.
    11. Noa Fish, Lilach Perry, Amit Bermano, and Daniel Cohen-Or. 2020. SketchPatch: Sketch Stylization via Seamless Patch-Level Synthesis. ACM Trans. Graph. 39, 6, Article 227 (nov 2020), 14 pages.
    12. Kevin Frans, Lisa B Soros, and Olaf Witkowski. 2021. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021).
    13. Freepik. 2010. Freepik Designs. https://www.freepik.com/
    14. FreeType. 2009. FreeType library. https://freetype.org/
    15. Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.
    16. Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2023. Designing an Encoder for Fast Personalization of Text-to-Image Models.
    17. David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings. In Sixth International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1704.03477.
    18. Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).
    19. Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. CoRR abs/2006.11239 (2020). arXiv:2006.11239 https://arxiv.org/abs/2006.11239
    20. Kai Hormann and Günther Greiner. 2000. MIPS: An efficient global parametrization method. Technical Report. Erlangen-Nuernberg Univ (Germany) Computer Graphics Group.
    21. Adobe Systems Inc. 1990. Adobe Type 1 Font Format. Addison Wesley Publishing Company.
    22. Ajay Jain, Amber Xie, and Pieter Abbeel. 2022. VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models. arXiv preprint arXiv:2211.11319 (2022).
    23. Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 4015–4022.
    24. Ji Lee. 2011. Word As Image. Adams Media, London.
    25. Tzu-Mao Li, Michal Lukáč, Gharbi Michaël, and Jonathan Ragan-Kelley. 2020. Differentiable Vector Graphics Rasterization for Editing and Learning. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 39, 6 (2020), 193:1–193:15.
    26. Zhouhui Lian, Bo Zhao, Xudong Chen, and Jianguo Xiao. 2018. EasyFont: A style learning-based system to easily build your large-scale handwriting fonts. ACM Transactions on Graphics (TOG) 38, 1 (2018), 1–18.
    27. Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. 2019. A Learned Representation for Scalable Vector Graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
    28. Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. 2022. Towards Layer-wise Image Vectorization.
    29. Wendong Mao, Shuai Yang, Huihong Shi, Jiaying Liu, and Zhongfeng Wang. 2022. Intelligent Typography: Artistic Text Style Transfer for Complex Texture and Structure. IEEE Transactions on Multimedia (2022), 1–15.
    30. Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations.
    31. Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. 2022. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures.
    32. Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).
    33. Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
    34. Laurence Penney. 1996. A History of TrueType. https://www.truetype-typography.com/.
    35. Huy Quoc Phan, Hongbo Fu, and Antoni B Chan. 2015. Flexyfont: Learning Transferring Rules for Flexible Typeface Synthesis. Computer Graphics Forum 34, 7 (2015), 245–256.
    36. Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
    37. Lakshman Prasad. 1997. Morphological analysis of shapes. CNLS newsletter 139, 1 (1997), 1997–07.
    38. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR abs/2103.00020 (2021). arXiv:2103.00020 https://arxiv.org/abs/2103.00020
    39. Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
    40. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV]
    41. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.
    42. Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).
    43. Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.
    44. Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, and Ahmed Elgammal. 2022. Diffusion Guided Domain Adaptation of Image Generators.
    45. Rapee Suveeranont and Takeo Igarashi. 2010. Example-Based Automatic Font Generation. In Smart Graphics. Number LNCS 6133 in Lecture Notes in Computer Science. 127–138.
    46. Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, and Devi Parikh. 2019. Trick or TReAT: Thematic Reinforcement for Artistic Typography.
    47. Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. 2022. Motionclip: Exposing human motion generation to clip space. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. Springer, 358–374.
    48. Yingtao Tian and David Ha. 2021. Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Conceptst. arXiv:2109.08857 [cs.NE]
    49. Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022a. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation.
    50. Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022b. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation.
    51. Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. 2022a. CLIPascene: Scene Sketching with Different Types and Levels of Abstraction.
    52. Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. 2022b. CLIPasso: Semantically-Aware Object Sketching. ACM Trans. Graph. 41, 4, Article 86 (jul 2022), 11 pages.
    53. Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. 2022. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.
    54. Wenjing Wang, Jiaying Liu, Shuai Yang, and Zongming Guo. 2019. Typography With Decor: Intelligent Text Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    55. Yizhi Wang and Zhouhui Lian. 2021. DeepVecFont: Synthesizing High-Quality Vector Fonts via Dual-Modality Learning. ACM Transactions on Graphics 40, 6 (Dec. 2021), 1–15.
    56. Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, and Zhouhui Lian. 2022. Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2436–2445.
    57. Jie Xu and Craig S. Kaplan. 2007. Calligraphic Packing. In Proceedings of Graphics Interface 2007 on – GI ’07. ACM Press, Montreal, Canada, 43.
    58. Shuai Yang, Jiaying Liu, Zhouhui Lian, and Zongming Guo. 2017. Awesome Typography: Statistics-Based Text Effects Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    59. Shuai Yang, Jiaying Liu, Wenhan Yang, and Zongming Guo. 2018. Context-Aware Un-supervised Text Stylization. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM ’18). Association for Computing Machinery, New York, NY, USA, 1688–1696.
    60. Shuai Yang, Zhangyang Wang, and Jiaying Liu. 2022. Shape-Matching GAN++: Scale Controllable Dynamic Artistic Text Style Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3807–3820.
    61. Junsong Zhang, Yu Wang, Weiyi Xiao, and Zhenshan Luo. 2017. Synthesizing Ornamental Typefaces: Synthesizing Ornamental Typefaces. Computer Graphics Forum 36, 1 (Jan. 2017), 64–75.
    62. Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, and Hongsheng Li. 2021. PointCLIP: Point Cloud Understanding by CLIP.
    63. Changqing Zou, Junjie Cao, Warunika Ranaweera, Ibraheem Alhashim, Ping Tan, Alla Sheffer, and Hao Zhang. 2016. Legible Compact Calligrams. ACM Transactions on Graphics 35, 4 (July 2016), 1–12.
    64. Ju Jia Zou, Hung-Hsin Chang, and Hong Yan. 2001. Shape skeletonization by identifying discrete local symmetries. Pattern Recognition 34, 10 (2001), 1895–1905.

ACM Digital Library Publication:

Overview Page: