Learning Local Shape Descriptors from Part Correspondences with Multiview Convolutional Networks

We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analysis problems such as point correspondences, semantic segmentation, affordance prediction, and shape-to-scan matching. The descriptor is produced by a convolutional network that is trained to embed geometrically and semantically similar points close to one another in descriptor space. The network processes surface neighborhoods around points on a shape that are captured at multiple scales by a succession of progressively zoomed-out views, taken from carefully selected camera positions. We leverage two extremely large sources of data to train our network. First, since our network processes rendered views in the form of 2D images, we repurpose architectures pretrained on massive image datasets. Second, we automatically generate a synthetic dense point correspondence dataset by nonrigid alignment of corresponding shape parts in a large collection of segmented 3D models. As a result of these design choices, our network effectively encodes multiscale local context and fine-grained surface detail. Our network can be trained to produce either category-specific descriptors or more generic descriptors by learning from multiple shape categories. Once trained, at test time, the network extracts local descriptors for shapes without requiring any part segmentation as input. Our method can produce effective local descriptors even for shapes whose category is unknown or different from the ones used while training. We demonstrate through several experiments that our learned local descriptors are more discriminative compared to state-of-the-art alternatives and are effective in a variety of shape analysis applications.

References:

M. Ankerst, G. Kastenmüller, H.-P. Kriegel, and T. Seidl. 1999. 3D shape histograms for similarity search and classification in spatial databases. In Proceedings of the International Symposium on Advances in Spatial Databases. 207–226.
M. Aubry, U. Schlickewei, and D. Cremers. 2011. The wave kernel signature: A quantum mechanical approach to shape analysis. In 2011 IEEE International Conference on Computer Vision Workshops.
S. Belongie, J. Malik, and J. Puzicha. 2002. Shape matching and object recognition using shape contexts. In IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 4 (2002), 509–522.
L. Bo, X. Ren, and D. Fox. 2014. Learning hierarchical sparse features for RGB-(D) object recognition. The International Journal of Robotics Research 33, 4 (2014), 581–599.
F. Bogo, J. Romero, M. Loper, and M. J. Black. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).
D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. 2015. Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. In Proceedings of the Symposium on Geometry Processing (SGP’15). 13–23.
D. Boscaini, J. Masci, E. Rodol, and M. M. Bronstein. 2016. Learning shape correspondence with anisotropic convolutional neural networks. The Conference and Workshop on Neural Information Processing Systems (NIPS’16).
J. Bromley, I. Guyon, Y. Lecun, E. Sackinger, and R. Shah. 1994. Signature Verification using a Siamese Time Delay Neural Network. Advances in Neural Information Processing Systems 6. Morgan-Kaufmann. 737–744.
A. M. Bronstein, M. M. Bronstein, L. J. Guibas, and M. Ovsjanikov. 2011. Shape Google: Geometric words and expressions for invariant shape retrieval. ACM Transactions on Graphics 30, 1 (2011), 1:1–1:20.
A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q.-X. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. 2015. ShapeNet: An information-rich 3D model repository. CoRR.
D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2003), 223–232.
H. Fu, D. Cohen-Or, G. Dror, and A. Sheffer. 2008. Upright orientation of man-made objects. ACM Trans. Graph. 27, 3 (2008).
R. Gal and D. Cohen-Or. 2006. Salient geometric features for partial shape matching and similarity. ACM Transactions on Graphics 25, 1 (2006), 130–150.
K. Guo, D. Zou, and X. Chen. 2015. 3D mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics 35, 1 (2015), 3:1–3:12.
R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06).
X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. 2015. MatchNet: Unifying feature and metric learning for patch-based matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).
Q.-X. Huang, H. Su, and L. Guibas. 2013. Fine-grained semi-supervised labeling of large shape collections. ACM Transactions on Graphics 32, 6 (2013), 190:1–190:10.
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. CoRR.
A. E. Johnson and M. Hebert. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. In IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5 (1999), 433–449.
E. Kalogerakis, M. Averkiou, S. Maji, and S. Chaudhuri. 2017. 3D shape segmentation with projective convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).
E. Kalogerakis, A. Hertzmann, and K. Singh. 2010. Learning 3D mesh segmentation and labeling. ACM Transactions on Graphics 29, 4 (2010), 102:1–102:12.
M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. 2004. Symmetry descriptors and 3D shape matching. In Proceedings of the Symposium on Geometry Processing (SGP’04).
V. G. Kim, S. Chaudhuri, L. Guibas, and T. Funkhouser. 2014. Shape2Pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 120:1–120:12.
V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi, and T. Funkhouser. 2013. Learning part-based templates from large collections of 3D shapes. ACM Transactions on Graphics 32, 4 (2013), 70:1–70:12.
D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. CoRR.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. The Conference and Workshop on Neural Information Processing Systems (NIPS’12).
K. Lai, L. Bo, and D. Fox. 2014. Unsupervised feature learning for 3D scene labeling. In IEEE International Conference on Robotics and Automation (ICRA’14).
G. Lavoue. 2012. Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer 28, 9 (2012), 931–942.
R. Litman, A. Bronstein, M. Bronstein, and U. Castellani. 2014. Supervised learning of bag-of-features shape descriptors using sparse coding. Computer Graphics Forum 33, 5 (2014), 127–136.
Y. Liu, H. Zha, and H. Qin. 2006. Shape topics: A compact representation and new algorithms for 3D partial shape retrieval. IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06).
J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. 2015. Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 37–45.
D. Maturana and S. Scherer. 2015. 3D convolutional neural networks for landing zone detection from LiDAR. In IEEE International Conference on Robotics and Automation (ICRA’15).
F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein. 2017. Geometric deep learning on graphs and manifolds using mixture model CNNs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
M. Novotni and R. Klein. 2003. 3D Zernike descriptors for content based shape retrieval. The 8th ACM Symposium on Solid Modeling and Applications.
R. Ohbuchi and T. Furuya. 2010. Distance metric learning and feature combination for shape-based 3D model retrieval. Proc. 3DOR.
R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. 2002. Shape distributions. ACM Transactions on Graphics 21, 4 (2002), 807–832.
M. Ovsjanikov, W. Li, L. Guibas, and N. J. Mitra. 2011. Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics 30, 4 (2011), 33:1–33:10.
C. R. Qi, H. Su, K. Mo, and L. J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
C. R. Qi, H. Su, M. Niener, A. Dai, M. Yan, and L. J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5648–5656.
E. Rodola, S. Bulo, T. Windheuser, M. Vestner, and D. Cremers. 2014. Dense non-rigid shape correspondence using random forests. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14).
E. Rodola, L. Cosmo, O. Litany, M. M. Bronstein, A. M. Bronstein, N. Audebert, A. Ben Hamza, A. Boulch, U. Castellani, M. N. Do, A.-D. Duong, T. Furuya, A. Gasparetto, Y. Hong, J. Kim, B. Le Saux, R. Litman, M. Masoumi, G. Minello, H.-D. Nguyen, V.-T. Nguyen, R. Ohbuchi, V.-K. Pham, T. V. Phan, M. Rezaei, A. Torsello, M.-T. Tran, Q.-T. Tran, B. Truong, L. Wan, and C. Zou. 2017. Deformable shape retrieval with missing parts. In Eurographics Workshop on 3D Object Retrieval (3DOR’17).
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.
D. Saupe and D. V. Vranic. 2001. 3D model retrieval with spherical harmonics and moments. In Symposium on Pattern Recognition. 392–397.
M. Savva, F. Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, S. Bai, X. Bai, N. Fish, J. Han, E. Kalogerakis, E. G. Learned-Miller, Y. Li, M. Liao, S. Maji, A. Tatsuma, Y. Wang, N. Zhang, and Z. Zhou. 2016. Large-scale 3D shape retrieval from shapenet core55. Eurographics Workshop on 3D Object Retrieval (3DOR’16).
L. Shapira, S. Shalom, A. Shamir, D. Cohen-Or, and H. Zhang. 2010. Contextual part analogies in 3D objects. International Journal of Computer Vision 89, 2–3 (2010), 309–326.
E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. 2015. Discriminative learning of deep convolutional feature point descriptors. In IEEE International Conference on Computer Vision (ICCV’15). 9.
K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR.
A. Sinha, J. Bai, and K. Ramani. 2016. Deep learning 3D shape surfaces using geometry images. European Conference on Computer Vision (ECCV’16).
R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. The Conference and Workshop on Neural Information Processing Systems (NIPS’12). 656–664.
S. Song and J. Xiao. 2016. Deep sliding shapes for amodal 3d object detection in RGB-D images. European Conference on Computer Vision (ECCV’16).
O. Sorkine and M. Alexa. 2007. As-rigid-as-possible surface modeling. In Proceedings of the Symposium on Geometry Processing (SGP’07).
H. Su, S. Maji, E. Kalogerakis, and E. G. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of ICCV.
R. W. Sumner, J. Schmid, and M. Pauly. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3 (2007).
F. Tombari, S. Salti, and L. Di Stefano. 2010. Unique signatures of histograms for local surface description. European Conference on Computer Vision (ECCV’10).
L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. 2016. Dense human body correspondences using convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).
Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1912–1920.
Y. Xian, B. Schiele, and Z. Akata. 2017. Zero-shot learning – The good, the bad and the ugly. CoRR (2017).
J. Xie, Y. Fang, F. Zhu, and E. Wong. 2015. Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).
K. Xu, V. G. Kim, Q. Huang, N. Mitra, and E. Kalogerakis. 2016. Data-driven shape analysis and processing. In SIGGRAPH ASIA 2016 Courses (SA’16). ACM.
K. M. Yi, E. Trulls, V. Lepetit, and P. Fua. 2016. LIFT: Learned invariant feature transform. European Conference on Computer Vision (ECCV’16).
L. Yi, V. G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics 35, 6 (2016), 210:1–210:12.
L. Yi, H. Su, X. Guo, and L. Guibas. 2017. Synchronized spectral CNN for 3D shape segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, and T. Funkhouser. 2016. 3DMatch: Learning local geometric descriptors from RGB-D reconstructions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
E. Zhang, K. Mischaikow, and G. Turk. 2005. Feature-based surface parameterization and texture mapping. ACM Transactions on Graphics 24, 1 (2005), 1–27.