“How do humans sketch objects?” by Eitz, Hays and Alexa

  • ©Mathias Eitz, James Hays, and Marc Alexa




    How do humans sketch objects?



    Humans have used sketching to depict our visual world since prehistoric times. Even today, sketching is possibly the only rendering technique readily available to all humans. This paper is the first large scale exploration of human sketches. We analyze the distribution of non-expert sketches of everyday objects such as ‘teapot’ or ‘car’. We ask humans to sketch objects of a given category and gather 20,000 unique sketches evenly distributed over 250 object categories. With this dataset we perform a perceptual study and find that humans can correctly identify the object category of a sketch 73% of the time. We compare human performance against computational recognition methods. We develop a bag-of-features sketch representation and use multi-class support vector machines, trained on our sketch dataset, to classify sketches. The resulting recognition method is able to identify unknown sketches with 56% accuracy (chance is 0.4%). Based on the computational model, we demonstrate an interactive sketch recognition system. We release the complete crowd-sourced dataset of sketches to the community.


    1. Chalechale, A., Naghdy, G., and Mertins, A. 2005. Sketch-based image matching using angular partitioning. IEEE Trans. Systems, Man and Cybernetics, Part A 35, 1, 28–41. Google ScholarDigital Library
    2. Chen, T., Cheng, M., Tan, P., Shamir, A., and Hu, S. 2009. Sketch2Photo: internet image montage. ACM Trans. Graph. (Proc. SIGGRAPH ASIA) 28, 5, 124:1–124:10. Google ScholarDigital Library
    3. Datta, R., Joshi, D., Li, J., and Wang, J. 2008. Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys 40, 2, 1–60. Google ScholarDigital Library
    4. Dixon, D., Prasad, M., and Hammond, T. 2010. iCanDraw?: using sketch recognition and corrective feedback to assist a user in drawing human faces. In Proc. Int’l. Conf. on Human Factors in Computing Systems, 897–906. Google ScholarDigital Library
    5. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans. Visualization and Computer Graphics 17, 11, 1624–1636. Google ScholarDigital Library
    6. Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications 31, 6, 56–66. Google ScholarDigital Library
    7. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4. to appear. Google ScholarDigital Library
    8. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. 2010. The PASCAL visual object classes (VOC) challenge. Int’l. Journal of Computer Vision 88, 2, 303–338. Google ScholarDigital Library
    9. Fu, H., Zhou, S., Liu, L., and Mitra, N. 2011. Animated construction of line drawings. ACM Trans. Graph. (Proc. SIGGRAPH ASIA) 30, 6, 133:1–133:10. Google ScholarDigital Library
    10. Garland, M., and Heckbert, P. 1997. Surface simplification using quadric error metrics. in Proc. SIGGRAPH, 209–216. Google ScholarDigital Library
    11. Georgescu, B., Shimshoni, I., and Meer, P. 2003. Mean shift based clustering in high dimensions: a texture classification example. in IEEE Int’l. Conf. Computer Vision, 456–463. Google ScholarDigital Library
    12. Griffin, G., Holub, A., and Perona, P. 2007. Caltech-256 object category dataset. Tech. rep., California institute of Technology.Google Scholar
    13. Hammond, T., and Davis, R. 2005. LADDER, a sketching language for user interface developers. Computers & Graphics 29, 4, 518–532. Google ScholarDigital Library
    14. Herot, C. F. 1976. Graphical input through machine recognition of sketches. Computer Graphics (Proc. SIGGRAPH) 10, 2, 97–102. Google ScholarDigital Library
    15. LaViola Jr., J. J., and Zeleznik, R. 2007. MathPad: a system for the creation and exploration of mathematical sketches. ACM Trans. Graph. (Proc. SIGGRAPH) 23, 3, 432–440. Google ScholarDigital Library
    16. Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In IEEE Conf. Computer Vision and Pattern Recognition, 2169–2178. Google ScholarDigital Library
    17. Lee, Y., Zitnick, C., and Cohen, M. 2011. ShadowDraw: real-time user guidance for freehand drawing. ACM Trans. Graph. (Proc. SIGGRAPH) 30, 4, 27:1–27:10. Google ScholarDigital Library
    18. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int’l. Journal of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
    19. Ouyang, T., and Davis, R. 2011. ChemInk: a natural real-time recognition system for chemical drawings. In Proc. Int’l. Conf. Intelligent User Interfaces, 267–276. Google ScholarDigital Library
    20. Paulson, B., and Hammond, T. 2008. PaleoSketch: accurate primitive sketch recognition and beautification. In Proc. Int’l. Conf. Intelligent User Interfaces, 1–10. Google ScholarDigital Library
    21. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. 2008. Lost in quantization: improving particular object retrieval in large scale image databases. In IEEE Conf. Computer Vision and Pattern Recognition, 1–8.Google Scholar
    22. Russell, B., Torralba, A., Murphy, K., and Freeman, W. 2008. LabelMe: a database and web-based tool for image annotation. Int’l Journal of Computer Vision 77, 1, 157–173. Google ScholarDigital Library
    23. Samet, H. 2006. Foundations of multidimensional and metric data structures. Morgan Kaufmann. Google ScholarDigital Library
    24. Schölkopf, B., and Smola, A. 2002. Learning with kernels. MIT Press.Google Scholar
    25. Sezgin, T. M., Stahovich, T., and Davis, R. 2001. Sketch based interfaces: early processing for sketch understanding. In Workshop on Perceptive User Interfaces, 1–8. Google ScholarDigital Library
    26. Shilane, P., Min, P., Kazhdan, M., and Funkhouser, T. 2004. The Princeton Shape Benchmark. In Shape Modeling International, 167–178. Google ScholarDigital Library
    27. Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph.. (Proc. SIGGRAPH ASIA) 30, 6, 154:1–154:10. Google ScholarDigital Library
    28. Sivic, J., and Zisserman, A. 2003. Video Google: a textretrieval approach to object matching in videos. In IEEE Int’l. Conf. Computer Vision, 1470–1477. Google ScholarDigital Library
    29. Sutherland, I. 1964. SketchPad: a man-machine graphical communication system. In Proc. AFIPS, 323–328. Google ScholarDigital Library
    30. van der Maaten, L., and Hinton, G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 2579–2605.Google Scholar
    31. Walther, D., Chai, B., Caddigan, E., Beck, D., and FeiFei, L. 2011. Simple line drawings suffice for functional MRI decoding of natural scene categories. Proc. National Academy of Sciences 108, 23, 9661–9666.Google ScholarCross Ref
    32. Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., and Torralba, A. 2010. SUN database: large-scale scene recognition from abbey to zoo. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 3485–3492.Google Scholar

ACM Digital Library Publication:

Overview Page: