Semantic object reconstruction via casual handheld scanning

We introduce a learning-based method to reconstruct objects acquired in a casual handheld scanning setting with a depth camera. Our method is based on two core components. First, a deep network that provides a semantic segmentation and labeling of the frames of an input RGBD sequence. Second, an alignment and reconstruction method that employs the semantic labeling to reconstruct the acquired object from the frames. We demonstrate that the use of a semantic labeling improves the reconstructions of the objects, when compared to methods that use only the depth information of the frames. Moreover, since training a deep network requires a large amount of labeled data, a key contribution of our work is an active self-learning framework to simplify the creation of the training data. Specifically, we iteratively predict the labeling of frames with the neural network, reconstruct the object from the labeled frames, and evaluate the confidence of the labeling, to incrementally train the neural network while requiring only a small amount of user-provided annotations. We show that this method enables the creation of data for training a neural network with high accuracy, while requiring only little manual effort.

References:

1. Dror Aiger, Niloy J Mitra, and Daniel Cohen-Or. 2008. 4-points congruent sets for robust pairwise surface registration. In ACM Transactions on Graphics (TOG), Vol. 27. ACM, 85. Google ScholarDigital Library
2. Yuri Boykov and Vladimir Kolmogorov. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Analysis & Machine Intelligence 26, 9 (2004), 1124–1137. Google ScholarDigital Library
3. Kang Chen, Yu-Kun Lai, and Shi-Min Hu. 2015. 3D indoor scene modeling from RGB-D data: a survey. Computational Visual Media 1, 4 (2015), 267–278.Google ScholarCross Ref
4. Sungjoon Choi, Q. Y. Zhou, and V. Koltun. 2015. Robust reconstruction of indoor scenes. In Proc. CVPR. IEEE, 5556–5565.Google Scholar
5. Sungjoon Choi, Qian-Yi Zhou, Stephen Miller, and Vladlen Koltun. 2016. A Large Dataset of Object Scans. Technical Report. arXiv:1602.02481.Google Scholar
6. Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In Proc. CVPR. IEEE.Google ScholarCross Ref
7. M. Dou, J. Taylor, H. Fuchs, A. Fitzgibbon, and S. Izadi. 2015. 3D scanning deformable objects with a single RGBD sensor. In Proc. CVPR. IEEE, 493–501.Google Scholar
8. Noa Fish, Oliver van Kaick, Amit Bermano, and Daniel Cohen-Or. 2016. Structure-oriented Networks of Shape Collections. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 35, 6 (2016), 171:1–14. Google ScholarDigital Library
9. Jorge Fuentes-Pacheco, José Ruiz-Ascencio, and Juan Manuel Rendón-Mancha. 2015. Visual simultaneous localization and mapping: a survey. Artificial Intelligence Review 43, 1 (2015), 55–81. Google ScholarDigital Library
10. Kan Guo, Dongqing Zou, and Xiaowu Chen. 2015. 3D Mesh Labeling via Deep Convolutional Neural Networks. ACM Trans. on Graphics 35, 1 (2015), 3:1–12. Google ScholarDigital Library
11. C. Häne, C. Zach, A. Cohen, and M. Pollefeys. 2016. Dense Semantic 3D Reconstruction. IEEE Trans. Pattern Analysis & Machine Intelligence PP, 99 (2016), 1–14.Google Scholar
12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. (2016).Google Scholar
13. Qixing Huang, Hai Wang, and Vladlen Koltun. 2015. Single-view Reconstruction via Joint Analysis of Image and Shape Collections. ACM Trans. on Graphics (Proc. SIGGRAPH) 34, 4 (2015), 87:1–10. Google ScholarDigital Library
14. O. Kähler, V. Adrian Prisacariu, C. Yuheng Ren, X. Sun, P. Torr, and D. Murray. 2015. Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices. IEEE TVCG 21, 11 (2015), 1241–1250. Google ScholarDigital Library
15. E. Kalogerakis, A. Hertzmann, and K. Singh. 2010. Learning 3D Mesh Segmentation and Labeling. ACM Trans. on Graphics (Proc. SIGGRAPH) 29, 3 (2010), 102:1–12. Google ScholarDigital Library
16. C. Kerl, J. Sturm, and D. Cremers. 2013. Robust odometry estimation for RGB-D cameras. In Proc. Int. Conf. on Robotics & Automation. IEEE, 3748–3754.Google Scholar
17. Young Min Kim, Niloy J. Mitra, Dong-Ming Yan, and Leonidas Guibas. 2012. Acquiring 3D Indoor Environments with Variability and Repetition. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 31, 6 (2012), 138:1–11. Google ScholarDigital Library
18. A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg. 2014. Joint Semantic Segmentation and 3D Reconstruction from Monocular Video. LNCS (Proc. ECCV) 8694 (2014), 703–718.Google Scholar
19. Minmin Lin, Tianjia Shao, Youyi Zheng, Niloy Jyoti Mitra, and Kun Zhou. 2018. Recovering Functional Mechanical Assemblies from Raw Scans. IEEE transactions on visualization and computer graphics 24, 3 (2018), 1354–1367.Google Scholar
20. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proc. CVPR. IEEE, 3431–3440.Google ScholarCross Ref
21. John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger. 2017. SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. In Proc. Int. Conf. on Robotics & Automation. IEEE.Google ScholarCross Ref
22. V. Morell-Gimenez, M. Saval-Calvo, J. Azorin-Lopez, J. Garcia-Rodriguez, M. Cazorla, S. Orts-Escolano, and A. Fuster-Guillo. 2014. A comparative study of registration methods for RGB-D video of static scenes. Sensors 14, 5 (2014), 8547–8576.Google ScholarCross Ref
23. Liangliang Nan, Ke Xie, and Andrei Sharf. 2012. A Search-classify Approach for Cluttered Indoor Scene Understanding. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 31, 6 (2012), 137:1–10. Google ScholarDigital Library
24. Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. KinectFusion: Real-Time Dense Surface Mapping and Tracking. In Proc. Int. Symp. on mixed and augmented reality. IEEE. Google ScholarDigital Library
25. Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Real-time 3D Reconstruction at Scale Using Voxel Hashing. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 32, 6 (2013), 169:1–11. Google ScholarDigital Library
26. Jeremie Papon, Alexey Abramov, Markus Schoeler, and Florentin Worgotter. 2013. Voxel cloud connectivity segmentation-supervoxels for point clouds. In Proc. CVPR. 2027–2034. Google ScholarDigital Library
27. M. Rünz and L. Agapito. 2017. Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In Proc. Int. Conf. on Robotics & Automation. 4471–4478.Google Scholar
28. R. F. Salas-Moreno, R. A. Newcombe, H. Strasdat, P. H. J. Kelly, and A. J. Davison. 2013. SLAM++: Simultaneous Localisation and Mapping at the Level of Objects. In Proc. CVPR. IEEE, 1352–1359. Google ScholarDigital Library
29. Ariel Shamir. 2008. A survey on Mesh Segmentation Techniques. Computer Graphics Forum 27, 6 (2008), 1539–1556.Google ScholarCross Ref
30. Tianjia Shao, Weiwei Xu, Kun Zhou, Jingdong Wang, Dongping Li, and Baining Guo. 2012. An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 31, 6 (2012), 136:1–11. Google ScholarDigital Library
31. Chao-Hui Shen, Hongbo Fu, Kang Chen, and Shi-Min Hu. 2012. Structure Recovery by Part Assembly. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 31, 6 (2012), 180:1–11. Google ScholarDigital Library
32. Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. 2016. Deep Automatic Portrait Matting. In Proc. Euro. Conf. on Computer Vision. Springer, 92–107.Google ScholarCross Ref
33. Zhenyu Shu, Chengwu Qi, Shiqing Xin, Chao Hu, Li Wang, Yu Zhang, and Ligang Liu. 2016. Unsupervised 3D shape segmentation and co-segmentation via deep learning. Computer Aided Geometric Design (Proc. Geometric Modeling and Processing) 43 (2016), 39–52. Google ScholarDigital Library
34. Oana Sidi, Oliver van Kaick, Yanir Kleiman, Hao Zhang, and Daniel Cohen-Or. 2011. Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 30, 6 (2011), 126:1–10. Google ScholarDigital Library
35. Jörg Stückler, Benedikt Waldvogel, Hannes Schulz, and Sven Behnke. 2015. Dense Real-time Mapping of Object-class Semantics from RGB-D Video. J. Real-Time Image Process. 10, 4 (2015), 599–609. Google ScholarDigital Library
36. S. Thrun. 2002. Robotic Mapping: A Survey. In Exploring Artificial Intelligence in the New Millenium, G. Lakemeyer and B. Nebel (Eds.). Morgan Kaufmann, 1–35. Google ScholarDigital Library
37. Stavros Tsogkas, Iasonas Kokkinos, George Papandreou, and Andrea Vedaldi. 2015. Semantic part segmentation with deep learning. arXiv preprint arXiv:1505.02438 (2015).Google Scholar
38. Oliver van Kaick, Hao Zhang, Ghassan Hamarneh, and Daniel Cohen-Or. 2011. A Survey on Shape Correspondence. Computer Graphics Forum 30, 6 (2011), 1681–1707.Google ScholarCross Ref
39. Yunhai Wang, Shmulik Asafi, Oliver van Kaick, Hao Zhang, Daniel Cohen-Or, and Baoquan Chen. 2012. Active Co-Analysis of a Set of Shapes. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 31, 6 (2012), 165:1–10. Google ScholarDigital Library
40. Thomas Whelan, Stefan Leutenegger, Renato Salas Moreno, Ben Glocker, and Andrew Davison. 2015. ElasticFusion: Dense SLAM Without A Pose Graph. In Proc. of Robotics: Science and Systems.Google ScholarCross Ref
41. Yu-Shiang Wong, Hung-Kuo Chu, and Niloy J. Mitra. 2015. SmartAnnotator: An Interactive Tool for Annotating Indoor RGBD Images. Computer Graphics Forum (Proc. Eurographics) 34, 2 (2015), 447–457. Google ScholarDigital Library
42. Fangting Xia, Peng Wang, Liang-Chieh Chen, and Alan L Yuille. 2015. Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net. In Proc. Euro. Conf. on Computer Vision, Vol. 9909.Google Scholar
43. J. Xiao, A. Owens, and A. Torralba. 2013. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels. In Proc. CVPR. IEEE, 1625–1632. Google ScholarDigital Library
44. Kai Xu, Hanlin Zheng, Hao Zhang, Daniel Cohen-Or, Ligang Liu, and Yueshan Xiong. 2011. Photo-Inspired Model-Driven 3D Object Modeling. ACM Trans. on Graphics (Proc. SIGGRAPH) 30, 4 (2011), 80:1–10. Google ScholarDigital Library
45. Mingliang Xu, Mingyuan Li, Weiwei Xu, Zhigang Deng, Yin Yang, and Kun Zhou. 2016. Interactive mechanism modeling from multi-view images. ACM Transactions on Graphics (TOG) 35, 6 (2016), 236. Google ScholarDigital Library
46. Feilong Yan, Andrei Sharf, Wenzhen Lin, Hui Huang, and Baoquan Chen. 2014. Proactive 3D Scanning of Inaccessible Parts. ACM Trans. on Graphics (Proc. SIGGRAPH) 33, 4 (2014), 157:1–8. Google ScholarDigital Library
47. Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, Cewu Lu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. 2016. A Scalable Active Framework for Region Annotation in 3D Shape Collections. ACM Trans. on Graphics (Proc. SIGGRAPH Asia) 35, 6 (2016), 210:1–12. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2018: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Semantic object reconstruction via casual handheld scanning”

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: