“Pixelor: a competitive sketching AI agent. so you think you can sketch?” by Bhunia, Das, Muhammad, Yang, Hospedales, et al. …
Conference:
Type(s):
Title:
- Pixelor: a competitive sketching AI agent. so you think you can sketch?
Session/Category Title: All About Sketches
Presenter(s)/Author(s):
Abstract:
We present the first competitive drawing agent Pixelor that exhibits humanlevel performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent’s goal is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors’ strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. The code and the dataset are available at http://sketchx.ai/pixelor.
References:
1. Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).Google Scholar
2. Itamar Berger, Ariel Shamir, Moshe Mahler, Elizabeth Carter, and Jessica Hodgins. 2013. Style and abstraction in portrait sketching. ACM Transactions on Graphics (TOG) 32, 4 (2013).Google ScholarDigital Library
3. Mikhail Bessmeltsev and Justin Solomon. 2019. Vectorization of line drawings via polyvector fields. ACM Transactions on Graphics (TOG) 38, 1 (2019), 1–12.Google ScholarDigital Library
4. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating Sentences from a Continuous Space. In Conference on Natural Language Learning (CoNLL).Google ScholarCross Ref
5. Christopher Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine learning (ICML).Google ScholarDigital Library
6. Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23–581 (2010).Google Scholar
7. Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, and Gang Hua. 2017b. Stylebank: An explicit representation for neural image style transfer. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
8. Yajing Chen, Shikui Tu, Yuqi Yi, and Lei Xu. 2017a. Sketch-pix2seq: a model to generate sketches of multiple categories. arXiv preprint arXiv:1709.04121 (2017).Google Scholar
9. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.Google ScholarDigital Library
10. Marco Cuturi, Olivier Teboul, and Jean-Philippe Vert. 2019. Differentiable Ranking and Sorting using Optimal Transport. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
11. Alexey Dosovitskiy and Vladlen Koltun. 2017. Learning to act by predicting the future. In International Conference on Learning Representations (ICLR).Google Scholar
12. Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Transactions on Graphics (TOG) 31, 4 (2012).Google ScholarDigital Library
13. Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Eslami, and Oriol Vinyals. 2018. Synthesizing programs for images using reinforced adversarial learning. arXiv preprint arXiv:1804.01118 (2018).Google Scholar
14. Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
15. Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research (JMLR) 13, 1 (2012).Google Scholar
16. Aditya Grover, Eric Wang, Aaron Zweig, and Stefano Ermon. 2019. Stochastic Optimization of Sorting Networks via Continuous Relaxations. arXiv:stat.ML/1903.08850Google Scholar
17. Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L Lewis, and Xiaoshi Wang. 2014. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In Neural Information Processing Systems (NeurIPS).Google Scholar
18. David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings. In International Conference on Learning Representations (ICLR).Google Scholar
19. David Ha and Jürgen Schmidhuber. 2018. Recurrent World Models Facilitate Policy Evolution. In Advances in Neural Information Processing Systems 31. 2451–2463.Google Scholar
20. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Neural Information Processing Systems (NeurIPS).Google Scholar
21. Zhewei Huang, Wen Heng, and Shuchang Zhou. 2019. Learning to paint with modelbased deep reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision. 8709–8718.Google ScholarCross Ref
22. Mohamad Faizal Ab Jabal, Mohd Shafry Mohd Rahim, Nur Zuraifah Syazrah Othman, and Zahabidin Jupri. 2009. A comparative study on extraction and recognition method of CAD data from CAD drawings. In International Conference on Information Management and Engineering (ICIME).Google ScholarDigital Library
23. Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. 2017. Reinforcement learning with unsupervised auxiliary tasks. In International Conference on Learning Representations (ICLR).Google Scholar
24. Qi Jia, Meiyu Yu, Xin Fan, and Haojie Li. 2017. Sequential Dual Deep Learning with Shape and Texture Features for Sketch Recognition. Computing Research Repository (CoRR) (2017).Google Scholar
25. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision (ECCV).Google ScholarCross Ref
26. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. Computing Research Repository (CoRR) (2014).Google Scholar
27. Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, and Ming-Hsuan Yang. 2019. Im2Pencil: Controllable Pencil Illustration From Photographs. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1525–1534.Google ScholarCross Ref
28. Yi Li, Timothy M Hospedales, Yi-Zhe Song, and Shaogang Gong. 2015. Free-hand sketch recognition by multi-kernel feature learning. Computer Vision and Image Understanding (CVIU) 137, C (2015).Google Scholar
29. Yi Li, Yi-Zhe Song, Timothy M Hospedales, and Shaogang Gong. 2017. Free-hand sketch synthesis with deformable stroke models. International Journal of Computer Vision (IJCV) 122, 1 (2017).Google ScholarCross Ref
30. Difan Liu, Mohamed Nabail, Aaron Hertzmann, and Evangelos Kalogerakis. 2020. Neural Contours: Learning to Draw Lines from 3D Shapes. arXiv (2020), arXiv-2003.Google Scholar
31. Tong Lu, Chiew-Lan Tai, Feng Su, and Shijie Cai. 2005. A new recognition model for electronic architectural drawings. Computer-Aided Design (CAD) 37, 10 (2005).Google Scholar
32. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML).Google Scholar
33. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015).Google Scholar
34. Robert Morris. 1997. Deep Blue Versus Kasparov: The Significance for Artificial Intelligence. Technical Report (1997).Google Scholar
35. Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional Image Synthesis with Auxiliary Classifier GANs. In International Conference on Machine Learning (ICML).Google Scholar
36. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems(NeurIPS).Google Scholar
37. Leonardo Rigutini, Tiziano Papini, Marco Maggini, and Franco Scarselli. 2011. SortNet: Learning to rank by a neural preference function. IEEE Transactions on Neural Networks 22, 9 (2011).Google ScholarDigital Library
38. Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016).Google ScholarDigital Library
39. Ravi Kiran Sarvadevabhatla, Shiv Surya, Trisha Mittal, and R. Venkatesh Babu. 2018. Game of Sketches: Deep Recurrent Models of Pictionary-Style Word Guessing. Computing Research Repository (CoRR) (2018).Google Scholar
40. Rosália G Schneider and Tinne Tuytelaars. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Transactions on Graphics (TOG) 33, 6 (2014).Google ScholarDigital Library
41. Rosália G Schneider and Tinne Tuytelaars. 2016. Example-based sketch segmentation and labeling using crfs. ACM Transactions on Graphics (TOG) 35, 5 (2016).Google ScholarDigital Library
42. Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. 2016. Recurrent dropout without memory loss. arXiv preprint arXiv:1603.05118 (2016).Google Scholar
43. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016).Google Scholar
44. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017).Google Scholar
45. Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. 2018. Mastering sketching: adversarial augmentation for structured prediction. ACM Transactions on Graphics (TOG) 37, 1 (2018), 1–13.Google ScholarDigital Library
46. Pedro Sousa and Manuel J Fonseca. 2009. Geometric matching for clip-art drawing retrieval. Visual Communication and Image Representation (VCIR) 20, 2 (2009).Google Scholar
47. Wanchao Su, Dong Du, Xin Yang, Shizhe Zhou, and Hongbo Fu. 2018. Interactive sketch-based normal map generation with deep neural networks. Proceedings of the ACM on Computer Graphics and Interactive Techniques 1, 1 (2018), 1–17.Google ScholarDigital Library
48. Gerald Tesauro. 1995. Td-gammon: A self-teaching backgammon program. In Applications of Neural Networks.Google Scholar
49. Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Scholkopf. 2018. Wasserstein Auto-Encoders. In International Conference on Learning Representations (ICLR).Google Scholar
50. Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S Lempitsky. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images.. In International Conference on Machine Learning (ICML).Google Scholar
51. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems (NeurIPS).Google Scholar
52. Baoxuan Xu, William Chang, Alla Sheffer, Adrien Bousseau, James McCrae, and Karan Singh. 2014. True2Form: 3D curve networks from 2D sketches via selective regularization. (2014).Google Scholar
53. Lumin Yang, Jiajie Zhuang, Hongbo Fu, Kun Zhou, and Youyi Zheng. 2020. SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks. arXiv preprint arXiv:2003.00678 (2020).Google Scholar
54. Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy Hospedales, and Chen Change Loy. 2016. Sketch Me That Shoe. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
55. Qian Yu, Yongxin Yang, Feng Liu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Sketch-a-net: A deep neural network that beats humans. International Journal of Computer Vision (IJCV) 122, 3 (2017).Google ScholarDigital Library
56. Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy Hospedales. 2015. Sketcha-net that beats humans. In British Machine Vision Conference (BMVC).Google Scholar
57. Lvmin Zhang, Yi Ji, and Xin Lin. 2017. Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN. Computing Research Repository (CoRR) (2017).Google Scholar
58. Ningyuan Zheng, Yifan Jiang, and Dingjiang Huang. 2018. Strokenet: A neural painting environment. In International Conference on Learning Representations.Google Scholar

