“CLIPasso: Semantically-Aware Abstract Object Sketching” by Vinker, Pajouheshgar, Bo, Bachmann, Bermano, et al. …

Next: “Cloud Rendering” by Leffler »

« Previous: “CLIP2StyleGAN: Unsupervised Extraction of...

Conference:

SIGGRAPH 2022

Experience Type(s):

Labs-Studio

Title:

CLIPasso: Semantically-Aware Abstract Object Sketching

Program Title:

Labs Demo

Organizer(s)/Presenter(s):

Description:

Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present CLIPasso, an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distill semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.

References:

Pablo Arbel?ez, Michael Maire, Charless Fowlkes, and Jitendra Malik. 2011. Contour Detection and Hierarchical Image Segmentation.?IEEE Transactions on Pattern Analysis and Machine Intelligence?33, 5 (2011), 898–916.?
Pierre B?nard and Aaron Hertzmann. 2019. Line Drawings from 3D Models.?Found. Trends Comput. Graph. Vis.?11 (2019), 1–159.
Itamar Berger, Ariel Shamir, Moshe Mahler, Elizabeth Carter, and Jessica Hodgins. 2013. Style and Abstraction in Portrait Sketching.?ACM Trans. Graph.?32, 4, Article 55 (jul 2013), 12 pages.?
Ayan Kumar Bhunia, Ayan Das, Umar Riaz Muhammad, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yulia Gryaditskaya, and Yi-Zhe Song. 2020. Pixelor: a competitive sketching AI agent. so you think you can sketch??ACM Trans. Graph.?39 (2020), 166:1–166:15.
John Canny. 1986. A computational approach to edge detection.?IEEE Transactions on pattern analysis and machine intelligence?6 (1986), 679–698.
Rebecca Chamberlain and Johan Wagemans. 2016. The genesis of errors in drawing.?Neuroscience & Biobehavioral Reviews?65 (2016), 195–207.
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers.?2021 IEEE/CVF International Conference on Computer Vision (ICCV)?(2021), 387–396.
Hong Chen, Ying-Qing Xu, Harry Shum, Song-Chun Zhu, and Nanning Zheng. 2001. Example-based facial sketch generation with non-parametric sampling.?Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001?2 (2001), 433–438 vol.2.
Yajing Chen, Shikui Tu, Yuqi Yi, and Lei Xu. 2017. Sketch-pix2seq: a Model to Generate Sketches of Multiple Categories.?ArXiv?abs/1709.04121 (2017).
Judith E. Fan, Daniel L. K. Yamins, and Nicholas B. Turk-Browne. 2018. Common Object Representations for Visual Production and Recognition.?Cognitive science?42 8 (2018), 2670–2698.
Judith W. Fan, Robert D. Hawkins, Mike Wu, and Noah D. Goodman. 2019. Pragmatic Inference and Visual Abstraction Enable Contextual Flexibility During Visual Communication.?Computational Brain & Behavior?3 (2019), 86–101.
Kevin Frans, Lisa B. Soros, and Olaf Witkowski. 2021. CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders.?CoRR?abs/2106.14843 (2021). arXiv:2106.14843 https://arxiv.org/abs/2106.14843
Yaroslav Ganin, Tejas D. Kulkarni, Igor Babuschkin, S. M. Ali Eslami, and Oriol Vinyals. 2018. Synthesizing Programs for Images using Reinforced Adversarial Learning.?ArXiv?abs/1804.01118 (2018).
Chengying Gao, Qi Liu, Qi Xu, Limin Wang, Jianzhuang Liu, and Changqing Zou. 2020. SketchyCOCO: Image Generation From Freehand Scene Sketches. In?Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.?5174–5183.
Gabriel Goh, Nick Cammarata &dagger;, Chelsea Voss &dagger;, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, and Chris Olah. 2021. Multimodal Neurons in Artificial Neural Networks.?Distill?(2021). https://distill.pub/2021/multimodal-neurons.?
Yulia Gryaditskaya, Mark Sypesteyn, Jan Willem Hoftijzer, Sylvia C. Pont, Fr?do Durand, and Adrien Bousseau. 2019. OpenSketch: a richly-annotated dataset of product design sketches.?ACM Trans. Graph.?38 (2019), 232:1–232:16.
David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings.?CoRR?abs/1704.03477 (2017). arXiv:1704.03477 http://arxiv.org/abs/1704.03477
A. Hertzmann. 2003. A survey of stroke-based rendering.?IEEE Computer Graphics and Applications?23, 4 (2003), 70–81.?
Aaron Hertzmann. 2020. Why Do Line Drawings Work? A Realism Hypothesis.?Perception?49 (2020), 439–451.
Aaron Hertzmann. 2021. The Role of Edges in Line Drawing Perception.?Perception?50 (2021), 266–275.
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In?CVPR.
Moritz Kampelm?hler and Axel Pinz. 2020. Synthesizing human-like sketches from natural images using a conditional convolutional decoder.?CoRR?abs/2003.07101 (2020). arXiv:2003.07101 https://arxiv.org/abs/2003.07101
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization.?CoRR?abs/1412.6980 (2015).
Alexander Kolesnikov, Alexey Dosovitskiy, Dirk Weissenborn, Georg Heigold, Jakob Uszkoreit, Lucas Beyer, Matthias Minderer, Mostafa Dehghani, Neil Houlsby, Sylvain Gelly, Thomas Unterthiner, and Xiaohua Zhai. 2021. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. In?ICLR ’21.
Yann LeCun and Corinna Cortes. 2005. The mnist database of handwritten digits.
Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-Sketching: Inferring Contour Drawings from Images. arXiv:1901.00542 [cs.CV]
Tzu-Mao Li, Michal Luk??, Gharbi Micha?l, and Jonathan Ragan-Kelley. 2020. Differentiable Vector Graphics Rasterization for Editing and Learning.?ACM Trans. Graph. (Proc. SIGGRAPH Asia)?39, 6 (2020), 193:1–193:15.
Yi Li, Yi-Zhe Song, Timothy M. Hospedales, and Shaogang Gong. 2015. Free-hand Sketch Synthesis with Deformable Stroke Models.?CoRR?abs/1510.02644 (2015). arXiv:1510.02644 http://arxiv.org/abs/1510.02644
Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, and X. Xue. 2020. Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt.?2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)?(2020), 6757–6766.
John F. J. Mellor, Eunbyung Park, Yaroslav Ganin, Igor Babuschkin, Tejas Kulkarni, Dan Rosenbaum, Andy Ballard, Theophane Weber, Oriol Vinyals, and S. M. Ali Eslami. 2019. Unsupervised Doodling and Painting with Improved SPIRAL.?CoRR?abs/1910.01007 (2019). arXiv:1910.01007 http://arxiv.org/abs/1910.01007
Daniela Mihai and Jonathon S. Hare. 2021a. Differentiable Drawing and Sketching.?ArXiv?abs/2103.16194 (2021).
Daniela Mihai and Jonathon S. Hare. 2021b. Learning to Draw: Emergent Communication through Sketching.?ArXiv?abs/2106.02067 (2021).
Meredith Minear and Denise C. Park. 2004. A lifespan database of adult facial stimuli.?Behavior Research Methods, Instruments, & Computers?36 (2004), 630–633.
Umar Riaz Muhammad, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy M. Hospedales. 2018. Learning Deep Sketch Abstraction.?CoRR?abs/1804.04804 (2018). arXiv:1804.04804 http://arxiv.org/abs/1804.04804
Yonggang Qi, Guoyao Su, Pinaki Nath Chowdhury, Mingkang Li, and Yi-Zhe Song. 2021. SketchLattice: Latticed Representation for Sketch Manipulation.?ArXiv?abs/2108.11636 (2021).
Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar Zaiane, and Martin Jagersand. 2020. U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection.?Pattern Recognition?106, 107404.
Shuwen Qiu, Sirui Xie, Lifeng Fan, Tao Gao, Song-Chun Zhu, and Yixin Zhu. 2021. Emergent Graphical Conventions in a Visual Communication Game.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision.?CoRR?abs/2103.00020 (2021). arXiv:2103.00020 https://arxiv.org/abs/2103.00020
Leo Sampaio Ferraz Ribeiro, Tu Bui, John P. Collomosse, and Moacir Antonelli Ponti. 2020. Sketchformer: Transformer-Based Representation for Sketched Structure.?2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)?(2020), 14141–14150.
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies.?ACM Trans. Graph.?35, 4, Article 119 (jul 2016), 12 pages.?
Jifei Song, Kaiyue Pang, Yi-Zhe Song, Tao Xiang, and Timothy Hospedales. 2018. Learning to Sketch with Shortcut Cycle Consistency. arXiv:1805.00247 [cs.CV]
Yingtao Tian and David Ha. 2021. Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts.?CoRR?abs/2109.08857 (2021). arXiv:2109.08857 https://arxiv.org/abs/2109.08857
Barbara Tversky. 2002. What do Sketches Say about Thinking.
V Varshaneya, Sangeetha Balasubramanian, and Vineeth N. Balasubramanian. 2021. Teaching GANs to sketch in vector format.?Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing?(2021).
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2017. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs.?CoRR?abs/1711.11585 (2017). arXiv:1711.11585 http://arxiv.org/abs/1711.11585
Holger Winnem?ller, Jan Eric Kyprianidis, and Sven C. Olsen. 2012. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization.?Comput. Graph.?36 (2012), 740–753.
Peng Xu, Timothy M. Hospedales, Qiyue Yin, Yi-Zhe Song, Tao Xiang, and Liang Wang. 2020. Deep Learning for Free-Hand Sketch: A Survey and A Toolbox. arXiv:2001.02600 [cs.CV]
Justin Yang and Judith E. Fan. 2021. Visual communication of object concepts at different levels of abstraction.?ArXiv?abs/2106.02775 (2021).
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric.?2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition?(2018), 586–595.
N. Zheng, Yf Jiang, and Ding jiang Huang. 2019. StrokeNet: A Neural Painting Environment. In?ICLR.
Tao Zhou, Chen Fang, Zhaowen Wang, Jimei Yang, Byungmoon Kim, Zhili Chen, Jonathan Brandt, and Demetri Terzopoulos. 2018. Learning to Sketch with Deep Q Networks and Demonstrated Strokes.?ArXiv?abs/1810.05977 (2018).

ACM Digital Library Publication:

CLIPasso: Semantically-Aware Abstract Object Sketching

Overview Page:

SIGGRAPH 2022: Labs

Submit a story:

If you would like to submit a story about this experience or presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES