“Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images”
Conference:
Type(s):
Title:
- Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images
Session/Category Title: Shape Semantics
Presenter(s)/Author(s):
Abstract:
Convolutional neural networks have been successfully used to compute shape descriptors, or jointly embed shapes and sketches in a common vector space. We propose a novel approach that leverages both labeled 3D shapes and semantic information contained in the labels, to generate semantically-meaningful shape descriptors. A neural network is trained to generate shape descriptors that lie close to a vector representation of the shape class, given a vector space of words. This method is easily extendable to range scans, hand-drawn sketches and images. This makes cross-modal retrieval possible, without a need to design different methods depending on the query type. We show that sketch-based shape retrieval using semantic-based descriptors outperforms the state-of-the-art by large margins, and mesh-based retrieval generates results of higher relevance to the query, than current deep shape descriptors.
References:
1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X., 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.
2. Bai, S., Bai, X., Zhou, Z., Zhang, Z., and Latecki, L. J. 2016. Gift: A real-time and scalable 3d shape search engine. In CVPR 2016. To appear.
3. Biasotti, S., Cerri, A., Bronstein, A., and Bronstein, M. 2015. Recent trends, applications, and perspectives in 3d shape similarity assessment. Computer Graphics Forum.
4. Boscaini, D., Masci, J., Rodolà, E., Bronstein, M. M., and Cremers, D. 2016. Anisotropic diffusion descriptors. In Eurographics 2016.
5. Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., and Yu, F. 2015. ShapeNet: An information-rich 3D model repository. In arXiv.
6. Choi, S., Zhou, Q.-Y., Miller, S., and Koltun, V. 2016. A large dataset of object scans. arXiv:1602.02481.
7. Duchi, J., Hazan, E., and Singer, Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12 (July), 2121–2159.
8. Eitz, M., Hays, J., and Alexa, M. 2012. How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4, 44:1–44:10.
9. Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
10. Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., and Mikolov, T. 2013. DeViSE: A deep visual-semantic embedding model. In NIPS’13, 2121–2129.
11. Gong, B., Liu, J., Wang, X., and Tang, X. 2013. Learning semantic signatures for 3d object retrieval. Trans. Multi. 15, 2 (Feb.), 369–377.
12. Hardoon, D. R., Szedmak, S. R., and Shawe-taylor, J. R. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 12, 2639–2664.
13. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
14. Karpathy, A., 2015. “CS231n: Convolutional Neural Networks for Visual Recognition”. http://cs231n.github.io/.
15. Kazhdan, M., Funkhouser, T., and Rusinkiewicz, S. 2003. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Symposium on Geometry Processing.
16. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 1097–1105.
17. Kruskal, J. B. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1, 1–27. Cross Ref
18. Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Chen, Q., Chowdhury, N., Fang, B., Furuya, T., Johan, H., Kosaka, R., Koyanagi, H., Ohbuchi, R., and Tatsuma, A. 2014. SHREC’14 track: Large Scale Comprehensive 3d shape retrieval. In Proc. EG Workshop on 3D Object Retrieval.
19. Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Fu, H., Furuya, T., Johan, H., Liu, J., Ohbuchi, R., Tatsuma, A., and Zou, C. 2014. SHREC’14 track: Extended large scale sketch-based 3d shape retrieval. In Proc. EG Workshop on 3D Object Retrieval, 2014.
20. Li, B., Lu, Y., Li, C., Godil, A., Schreck, T., Aono, M., Burtscher, M., Chen, Q., Chowdhury, N. K., Fang, B., Fu, H., Furuya, T., Li, H., Liu, J., Johan, H., Kosaka, R., Koyanagi, H., Ohbuchi, R., Tatsuma, A., Wan, Y., Zhang, C., and Zou, C. 2015. A comparison of 3d shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding 131, 1 — 27.
21. Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., and Guibas, L. J. 2015. Joint embeddings of shapes and images via cnn image purification. ACM Trans. Graph. 34, 6 (Oct.), 234:1–234:12.
22. Masci, J., Boscaini, D., Bronstein, M. M., and Vandergheynst, P. 2015. Geodesic convolutional neural networks on riemannian manifolds. In The IEEE International Conference on Computer Vision (ICCV) Workshops.
23. Mikolov, T., Chen, K., Corrado, G., and Dean, J. 2013. Efficient estimation of word representations in vector space. ICLR Workshop.
24. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3, 211–252.
25. Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., Bai, X., Fish, N., Han, J., Kalogerakis, E., Learned-Miller, E. G., Li, Y., Liao, M., Maji, S., Wang, Y., Zhang, N., and Zhou, Z. 2016. Large-Scale 3D Shape Retrieval from ShapeNet Core55. In Proc. EG Workshop on 3D Object Retrieval.
26. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1929–1958.
27. Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proc. ICCV.
28. van der Maaten, L. 2009. Learning a parametric embedding by preserving local structure. In Proc. of AISTATS, vol. 5, 384–391.
29. Wang, A., Lu, J., Cai, J., Cham, T. J., and Wang, G. 2015. Large-margin multi-modal deep learning for rgb-d object recognition. IEEE Transactions on Multimedia 17, 11, 1887–1898. Cross Ref
30. Wang, F., Kang, L., and Li, Y. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In CVPR 2015.
31. Wu, Z., and Palmer, M. 1994. Verbs semantics and lexical selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics, ACL ’94, 133–138.
32. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 2015. 3d shapenets: A deep representation for volumetric shapes. In CVPR 2015, 1912–1920.
33. Yu, Q., Yang, Y., Song, Y., Xiang, T., and Hospedales, T. 2015. Sketch-a-net that beats humans. In BMVC15, 7.


