“Learning visual similarity for product design with convolutional neural networks”

  • ©Sean Bell and Kavita Bala




    Learning visual similarity for product design with convolutional neural networks

Session/Category Title: Image Similarity & Search




    Popular sites like Houzz, Pinterest, and LikeThatDecor, have communities of users helping each other answer questions about products in images. In this paper we learn an embedding for visual search in interior design. Our embedding contains two different domains of product images: products cropped from internet scenes, and products in their iconic form. With such a multi-domain embedding, we demonstrate several applications of visual search including identifying products in scenes and finding stylistically similar products. To obtain the embedding, we train a convolutional neural network on pairs of images. We explore several training architectures including re-purposing object classifiers, using siamese networks, and using multitask learning. We evaluate our search quantitatively and qualitatively and demonstrate high quality results for search across multiple visual domains, enabling new applications in interior design.


    1. Babenko, A., Slesarev, A., Chigorin, A., and Lempitsky, V. S. 2014. Neural codes for image retrieval. In ECCV.Google Scholar
    2. Bell, S., Upchurch, P., Snavely, N., and Bala, K. 2013. OpenSurfaces: A richly annotated catalog of surface appearance. ACM Trans. on Graphics (SIGGRAPH) 32, 4. Google ScholarDigital Library
    3. Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. 2014. Return of the devil in the details: Delving deep into convolutional nets. In BMVC.Google Scholar
    4. Chechik, G., Sharma, V., Shalit, U., and Bengio, S. 2010. Large scale online learning of image similarity through ranking. JMLR. Google ScholarDigital Library
    5. Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In CVPR, IEEE Press. Google ScholarDigital Library
    6. Garces, E., Agarwala, A., Gutierrez, D., and Hertzmann, A. 2014. A similarity measure for illustration style. ACM Trans. Graph. 33, 4 (July). Google ScholarDigital Library
    7. Gingold, Y., Shamir, A., and Cohen-Or, D. 2012. Micro perceptual human computation. TOG 31, 5. Google ScholarDigital Library
    8. Girod, B., Chandrasekhar, V., Chen, D. M., Cheung, N.-M., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S. S., and Vedantham, R., 2011. Mobile visual search.Google Scholar
    9. Hadsell, R., Chopra, S., and LeCun, Y. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR, IEEE Press. Google ScholarDigital Library
    10. Jegou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., and Schmid, C. 2012. Aggregating local image descriptors into compact codes. PAMI 34, 9. Google ScholarDigital Library
    11. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.Google Scholar
    12. Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., and Winnemoeller, H. 2014. Recognizing image style. In BMVC.Google Scholar
    13. Kovashka, A., Parikh, D., and Grauman, K. 2012. Whittlesearch: Image search with relative attribute feedback. In CVPR. Google ScholarDigital Library
    14. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS.Google Scholar
    15. Kulis, B. 2012. Metric learning: A survey. Foundations and Trends in Machine Learning 5, 4.Google Scholar
    16. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation 1, 4. Google ScholarDigital Library
    17. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. ECCV.Google Scholar
    18. Muja, M., and Lowe, D. G. 2014. Scalable nearest neighbor algorithms for high dimensional data. PAMI.Google Scholar
    19. O’Donovan, P., Lībeks, J., Agarwala, A., and Hertzmann, A. 2014. Exploratory font selection using crowdsourced attributes. ACM Trans. Graph. 33, 4. Google ScholarDigital Library
    20. Ordonez, V., Jagadeesh, V., Di, W., Bhardwaj, A., and Piramuthu, R. 2014. Furniture-geek: Understanding fine-grained furniture attributes from freely associated text and tags. In WACV, 317–324.Google Scholar
    21. Parikh, D., and Grauman, K. 2011. Relative attributes. In ICCV, 503–510. Google ScholarDigital Library
    22. Perronnin, F., and Dance, C. 2007. Fisher kernels on visual vocabularies for image categorization. In CVPR.Google Scholar
    23. Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson, S. 2014. CNN features off-the-shelf: an astounding baseline for recognition. Deep Vision (CVPR Workshop). Google ScholarDigital Library
    24. Razavian, A. S., Sullivan, J., Maki, A., and Carlsson, S. 2014. Visual instance retrieval with deep convolutional networks. arXiv:1412.6574.Google Scholar
    25. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. Learning internal representations by error-propagation. Parallel Distributed Processing 1. Google ScholarDigital Library
    26. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2015. Going deeper with convolutions. CVPR.Google Scholar
    27. Taigman, Y., Yang, M., Ranzato, M. A., and Wolf, L. 2014. Deepface: Closing the gap to human-level performance in face verification. In CVPR. Google ScholarDigital Library
    28. Van Der Maaten, L., and Hinton, G. 2008. Visualizing data using t-SNE. In Journal of Machine Learning.Google Scholar
    29. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., and Wu, Y. 2014. Learning fine-grained image similarity with deep ranking. In CVPR. Google ScholarDigital Library
    30. Weston, J., Ratle, F., and Collobert, R. 2008. Deep learning via semi-supervised embedding. In ICML. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: