“The sketchy database: learning to retrieve badly drawn bunnies”
Conference:
Type(s):
Title:
- The sketchy database: learning to retrieve badly drawn bunnies
Abstract:
We present the Sketchy database, the first large-scale collection of sketch-photo pairs. We ask crowd workers to sketch particular photographic objects sampled from 125 categories and acquire 75,471 sketches of 12,500 objects. The Sketchy database gives us fine-grained associations between particular photos and sketches, and we use this to train cross-domain convolutional networks which embed sketches and photographs in a common feature space. We use our database as a benchmark for fine-grained retrieval and show that our learned representation significantly outperforms both hand-crafted features as well as deep features trained for sketch or photo classification. Beyond image retrieval, we believe the Sketchy database opens up new opportunities for sketch and image understanding and synthesis.
References:
1. Antol, S., Zitnick, C. L., and Parikh, D. 2014. Zero-Shot Learning via Visual Abstraction. In ECCV.Google Scholar
2. Bansal, A., Kowdle, A., Parikh, D., Gallagher, A., and Zitnick, L. 2013. Which edges matter? In Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on, 578–585. Google ScholarDigital Library
3. Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. 34, 4 (July). Google ScholarDigital Library
4. Berger, I., Shamir, A., Mahler, M., Carter, E., and Hodgins, J. 2013. Style and abstraction in portrait sketching. ACM Trans. Graph. 32, 4 (July), 55:1–55:12. Google ScholarDigital Library
5. Brady, T. F., Konkle, T., Alvarez, G. A., and Oliva, A. 2008. Visual long-term memory has a massive storage capacity for object details. Proceedings of the National Academy of Sciences 105, 38, 14325–14329.Google ScholarCross Ref
6. Brady, T. F., Konkle, T., Gill, J., Oliva, A., and Alvarez, G. A. 2013. Visual long-term memory has the same limit on fidelity as visual working memory. Psychological Science 24, 6.Google ScholarCross Ref
7. Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 761–768. Google ScholarDigital Library
8. Cao, X., Zhang, H., Liu, S., Guo, X., and Lin, L. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor. In Computer Vision (ICCV), 2013 IEEE International Conference on, 313–320. Google ScholarDigital Library
9. Chen, T., ming Cheng, M., Tan, P., Shamir, A., and min Hu, S. 2009. Sketch2photo: internet image montage. ACM SIGGRAPH Asia. Google ScholarDigital Library
10. Chen, T., Tan, P., Ma, L.-Q., Cheng, M.-M., Shamir, A., and Hu, S.-M. 2013. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics 19, 5 (May), 824–837. Google ScholarDigital Library
11. Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, 539–546. Google ScholarDigital Library
12. Cole, F., Golovinskiy, A., Limpaecher, A., Barros, H. S., Finkelstein, A., Funkhouser, T., and Rusinkiewicz, S. 2008. Where do people draw lines? ACM Transactions on Graphics (Proc. SIGGRAPH) 27, 3 (Aug.). Google ScholarDigital Library
13. Del Bimbo, A., and Pala, P. 1997. Visual image retrieval by elastic matching of user sketches. Pattern Analysis and Machine Intelligence, IEEE Transactions on 19, 2 (Feb), 121–132. Google ScholarDigital Library
14. Dosovitskiy, A., Springenberg, J. T., and Brox, T. 2014. Learning to generate chairs with convolutional neural networks. CoRR abs/1411.5928.Google Scholar
15. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2010. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics 34, 5, 482–498. Google ScholarDigital Library
16. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics 17, 11, 1624–1636. Google ScholarDigital Library
17. Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications. Google ScholarDigital Library
18. Eitz, M., Hays, J., and Alexa, M. 2012. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, 44:1–44:10. Google ScholarDigital Library
19. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM Transactions on Graphics (Proceedings SIGGRAPH) 31, 4, 31:1–31:10. Google ScholarDigital Library
20. Everingham, M., Gool, L., Williams, C. K., Winn, J., and Zisserman, A. 2010. The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88, 2 (June), 303–338. Google ScholarDigital Library
21. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. 2010. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9 (Sept.), 1627–1645. Google ScholarDigital Library
22. Grill-Spector, K., and Kanwisher, N. 2005. Visual recognition: as soon as you see it, you know what it is. Psychological Science 16, 2, 152–160.Google ScholarCross Ref
23. Hadsell, R., Chopra, S., and LeCun, Y. 2006. Dimensionality reduction by learning an invariant mapping. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, 1735–1742. Google ScholarDigital Library
24. Han, X., Leung, T., Jia, Y., Sukthankar, R., and Berg, A. 2015. Matchnet: Unifying feature and metric learning for patch-based matching. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, 3279–3286.Google Scholar
25. Hu, R., and Collomosse, J. 2013. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision and Image Understanding 117, 7, 790–806. Google ScholarDigital Library
26. Jacobs, C. E., Finkelstein, A., and Salesin, D. H. 1995. Fast multiresolution image querying. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques, ACM, SIGGRAPH ’95, 277–286. Google ScholarDigital Library
27. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.Google Scholar
28. Jun, X., Aaron, H., Wilmot, L., and Holger, W. 2014. Portraitsketch: Face sketching assistance for novices. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, ACM. Google ScholarDigital Library
29. Kato, T., Kurita, T., Otsu, N., and Hirata, K. 1992. A sketch retrieval method for full color image database-query by visual example. In Pattern Recognition, 1992. Vol. I. Conference A: Computer Vision and Applications, Proceedings., 11th IAPR International Conference on, 530–533.Google Scholar
30. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In 26th Annual Conference on Neural Information Processing Systems (NIPS), 1106–1114.Google Scholar
31. Lee, D., and Chun, M. M. What are the units of visual short-term memory, objects or spatial locations? Perception & Psychophysics 63, 2, 253–257.Google Scholar
32. Li, Y., Hospedales, T. M., Song, Y.-Z., and Gong, S. 2014. Fine-grained sketch-based image retrieval by matching deformable part models. In British Machine Vision Conference (BMVC).Google Scholar
33. Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph. 34, 6 (Oct.), 234:1–234:12. Google ScholarDigital Library
34. Limpaecher, A., Feltman, N., Treuille, A., and Cohen, M. 2013. Real-time drawing assistance through crowdsourcing. ACM Trans. Graph. 32, 4 (July), 54:1–54:8. Google ScholarDigital Library
35. Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
36. Lin, T.-Y., Cui, Y., Belongie, S., and Hays, J. 2015. Learning deep representations for ground-to-aerial geolocalization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
37. Mainelli, T., Chau, M., Reith, R., and Shirer, M., 2015. Idc worldwide quarterly smart connected device tracker. http://www.idc.com/getdoc.jsp?containerId=prUS25500515, March 20, 2015.Google Scholar
38. Martin, D., Fowlkes, C., Tal, D., and Malik, J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf. Computer Vision, vol. 2, 416–423.Google ScholarCross Ref
39. Nieuwenstein, M., and Wyble, B. 2014. Beyond a mask and against the bottleneck: Retroactive dual-task interference during working memory consolidation of a masked visual target. Journal of Experimental Psychology: General 143, 1409–1427.Google ScholarCross Ref
40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3, 211–252. Google ScholarDigital Library
41. Saavedra, J. M., and Barrios, J. M. 2015. Sketch based image retrieval using learned keyshapes (lks). In Proceedings of the British Machine Vision Conference (BMVC), 164.1–164.11.Google Scholar
42. Schneider, R. G., and Tuytelaars, T. 2014. Sketch classification and classification-driven analysis using fisher vectors. ACM Trans. Graph. 33, 6 (Nov.), 174:1–174:9. Google ScholarDigital Library
43. Sclaroff, S. 1997. Deformable prototypes for encoding shape categories in image databases. Pattern Recognition 30, 4, 627–641.Google ScholarCross Ref
44. Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 154. Google ScholarDigital Library
45. Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 12 (Dec), 1349–1380. Google ScholarDigital Library
46. Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proc. ICCV. Google ScholarDigital Library
47. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google Scholar
48. Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, 1701–1708. Google ScholarDigital Library
49. van der Maaten, L., and Hinton, G. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 3 (Nov.), 2579–2605.Google Scholar
50. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661. Google ScholarDigital Library
51. Wang, F., Kang, L., and Li, Y. 2015. Sketch-based 3d shape retrieval using convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
52. Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., and Oliva, A. 2014. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, 1–20. Google ScholarDigital Library
53. Yu, Q., Yang, Y., Song, Y.-Z., Xiang, T., and Hospedales, T. 2015. Sketch-a-net that beats humans. In British Machine Vision Conference (BMVC).Google Scholar
54. Yu, Q., Liu, F., Song, Y., Xiang, T., Hospedales, T., and Loy, C. C. 2016. Sketch me that shoe. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
55. Zeiler, M. D., and Fergus, R. 2014. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014. Springer, 818–833.Google Scholar
56. Zhou, T., Jae Lee, Y., Yu, S. X., and Efros, A. A. 2015. Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
57. Zhu, J.-Y., Lee, Y. J., and Efros, A. A. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM Transactions on Graphics (SIGGRAPH 2014) 33, 4. Google ScholarDigital Library