“O-CNN: octree-based convolutional neural networks for 3D shape analysis”

  • ©Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong

Conference:


Type:


Title:

    O-CNN: octree-based convolutional neural networks for 3D shape analysis

Presenter(s)/Author(s):



Abstract:


    We present O-CNN, an Octree-based Convolutional Neural Network (CNN) for 3D shape analysis. Built upon the octree representation of 3D shapes, our method takes the average normal vectors of a 3D model sampled in the finest leaf octants as input and performs 3D CNN operations on the octants occupied by the 3D shape surface. We design a novel octree data structure to efficiently store the octant information and CNN features into the graphics memory and execute the entire O-CNN training and evaluation on the GPU. O-CNN supports various CNN structures and works for 3D shapes in different representations. By restraining the computations on the octants occupied by 3D surfaces, the memory and computational costs of the O-CNN grow quadratically as the depth of the octree increases, which makes the 3D CNN feasible for high-resolution 3D models. We compare the performance of the O-CNN with other existing 3D CNN solutions and demonstrate the efficiency and efficacy of O-CNN in three shape analysis tasks, including object classification, shape retrieval, and shape segmentation.

References:


    1. Song Bai, Xiang Bai, Zhichao Zhou, Zhaoxiang Zhang, and Longin Jan Latecki. 2016. GIFT: A real-time and scalable 3D shape search engine. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
    2. D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. 2015. Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. Comput. Graph. Forum 34, 5 (2015), 13–23.Google ScholarCross Ref
    3. Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael M. Bronstein. 2016. Learning shape correspondence with anisotropic convolutional neural networks. In Neural Information Processing Systems (NIPS).Google Scholar
    4. Andrew Brock, Theodore Lim, J.M. Ritchie, and Nick Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. In 3D deep learning workshop (NIPS).Google Scholar
    5. M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. 2017. Geometric deep learning: going beyond Euclidean data. IEEE Sig. Proc. Magazine (2017).Google Scholar
    6. Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: an information-rich 3D model repository. arXiv:1512.03012 [cs.GR]. (2015).Google Scholar
    7. Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In International Conference on Frontiers in Handwriting Recognition (ICFHR).Google Scholar
    8. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.Google Scholar
    9. Ben Graham. 2015. Sparse 3D convolutional neural networks. In British Machine Vision Conference (BMVC). Google ScholarCross Ref
    10. Kan Guo, Dongqing Zou, and Xiaowu Chen. 2015. 3D mesh labeling via deep convolutional neural networks. ACM Trans. Graph. 35, 1 (2015), 3:1–3:12.Google ScholarDigital Library
    11. Varun Jampani, Martin Kiefel, and Peter V. Gehler. 2016. Learning sparse high dimensional filters: image filtering, dense CRFs and bilateral neural networks. In Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    12. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: convolutional architecture for fast feature embedding. In ACM Multimedia (ACMMM). 675–678. Google ScholarDigital Library
    13. Philipp Krähenbühl and Vladlen Koltun. 2011. Efficient inference in fully connected CRFs with gaussian edge potentials. In Neural Information Processing Systems (NIPS).Google Scholar
    14. Philipp Krähenbühl and Vladlen Koltun. 2013. Parameter learning and convergent inference for dense random fields. In International Conference on Machine Learning (ICML). 513–521.Google Scholar
    15. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324. Google ScholarCross Ref
    16. Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, and Leonidas J. Guibas. 2016. FPNN: field probing neural networks for 3D data. In Neural Information Processing Systems (NIPS).Google Scholar
    17. Sergey Loffe and Christian Szegedy. 2015. Batch Normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML). 448–456.Google Scholar
    18. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional models for semantic segmentation. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
    19. Jonathan Masci, Davide Boscaini, Michael M. Bronstein, and Pierre Vandergheynst. 2015. Geodesic convolutional neural networks on Riemannian manifolds. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
    20. D. Maturana and S. Scherer. 2015. VoxNet: a 3D convolutional neural network for real-time object recognition. In International Conference on Intelligent Robots and Systems (IROS). Google ScholarCross Ref
    21. Donald Meagher. 1982. Geometric modeling using octree encoding. Computer Graphics and Image Processing 19 (1982), 129–147. Google ScholarCross Ref
    22. Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In International Conference on Computer Vision (ICCV). 1520–1528. Google ScholarDigital Library
    23. Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
    24. Charles Ruizhongtai Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. In Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    25. Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning deep 3D representations at high resolutions. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
    26. M. Savva, F. Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, S. Bai, X. Bai, N. Fish, J. Han, E. Kalogerakis, E. G. Learned-Miller, Y. Li, M. Liao, S. Maji, A. Tatsuma, Y. Wang, N. Zhang, and Z. Zhou 4. 2016. SHREC’16 Track – Large-scale 3D shape retrieval from ShapeNet Core55. In Eurographics Workshop on 3D Object Retrieval.Google Scholar
    27. B. Shi, S. Bai, Z. Zhou, and X. Bai. 2015. DeepPano: deep panoramic representation for 3-D shape recognition. IEEE Signal Processing Letters 22, 12 (2015), 2339–2343. Google ScholarCross Ref
    28. Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep learning 3D shape surfaces using geometry images. In European Conference on Computer Vision (ECCV). 223–240. Google ScholarCross Ref
    29. Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929–1958.Google ScholarDigital Library
    30. H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
    31. Jane Wilhelms and Allen Van Gelder. 1992. Octrees for faster isosurface generation. ACM Trans. Graph. 11, 3 (1992), 201–227. Google ScholarDigital Library
    32. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shape modeling. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
    33. Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Trans. Graph. (SIGGRAPH ASIA) 35, 6 (2016), 210:1–210:12.Google Scholar
    34. Li Yi, Hao Su, Xingwen Guo, and Leonidas Guibas. 2017. SyncSpecCNN: synchronized spectral CNN for 3D shape segmentation. In Computer Vision and Pattern Recognition (CVPR).Google Scholar
    35. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
    36. Kun Zhou, Minmin Gong, Xin Huang, and Baining Guo. 2011. Data-parallel octrees for surface reconstruction. IEEE. T. Vis. Comput. Gr. 17, 5 (2011), 669–681. Google ScholarDigital Library


ACM Digital Library Publication:



Overview Page: