Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information

We present a novel solution to automatic semantic modeling of indoor scenes from a sparse set of low-quality RGB-D images. Such data presents challenges due to noise, low resolution, occlusion and missing depth information. We exploit the knowledge in a scene database containing 100s of indoor scenes with over 10,000 manually segmented and labeled mesh models of objects. In seconds, we output a visually plausible 3D scene, adapting these models and their parts to fit the input scans. Contextual relationships learned from the database are used to constrain reconstruction, ensuring semantic compatibility between both object models and parts. Small objects and objects with incomplete depth information which are difficult to recover reliably are processed with a two-stage approach. Major objects are recognized first, providing a known scene structure. 2D contour-based model retrieval is then used to recover smaller objects. Evaluations using our own data and two public datasets show that our approach can model typical real-world indoor scenes efficiently and robustly.

References:

1. Bao, S. Y., Bagra, M., Chao, Y.-W., and Savarese, S. 2012. Semantic structure from motion with points, regions, and objects. In Proc. CVPR, 2703–2710.
2. Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 3, 346–359.
3. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. PAMI. 24, 4, 509–522.
4. Besl, P. J., and McKay, N. D. 1992. A method for registration of 3-D shapes. IEEE Trans. PAMI. 14, 2, 239–256.
5. Bo, L., Ren, X., and Fox, D. 2011. Depth kernel descriptors for object recognition. In IROS, 821–826.
6. Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. PAMI. 23, 11, 1222–1239.
7. Breiman, L. 2001. Random forests. Mach. Learn. 45, 1, 5–32.
8. Chen, Y., and Medioni, G. 1992. Object modelling by registration of multiple range images. Image Vision Comput. 10, 3, 145–155.
9. Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., and Hebert, M. 2009. An empirical study of context in object detection. In Proc. CVPR, 1271–1278.
10. Durrant-Whyte, H., and Bailey, T. 2006. Simultaneous localisation and mapping (SLAM): Part I the essential algorithms. Robotics and Automation Magazine 13, 2, 99–110.Cross Ref
11. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM Trans. Graph. 31, 4, 31:1–31:10.
12. Fischler, M. A., and Bolles, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381–395.
13. Fisher, M., and Hanrahan, P. 2010. Context-based search for 3D models. ACM Trans. Graph. 29, 6, 182:1–182:10.
14. Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. ACM Trans. Graph. 30, 4, 34:1–34:12.
15. Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. ACM Trans. Graph. 31, 6, 135:1–135:11.
16. Friedman, N., Geiger, D., and Goldszmidt, M. 1997. Bayesian network classifiers. Mach. Learn. 29, 2-3, 131–163.
17. Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D., and Jacobs, D. 2003. A search engine for 3D models. ACM Trans. Graph. 22, 1, 83–105.
18. Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Reconstructing building interiors from images. In Proc. ICCV, 80–87.
19. Galleguillos, C., Rabinovich, A., and Belongie, S. 2008. Object categorization using co-occurrence, location and appearance. In Proc. CVPR, 1–8.
20. Grady, L. 2006. Random walks for image segmentation. IEEE Trans. PAMI. 28, 11, 1768–1783.
21. Henry, P., Krainin, M., Herbst, E., Ren, X., and Fox, D. 2010. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Proc. Intl. Symp. Experimental Robotics, 22–25.
22. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., and Navab, N. 2012. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In Proc. ACCV, 548–562.
23. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., and Fitzgibbon, A. 2011. Kinect-Fusion: real-time 3D reconstruction and interaction using a moving depth camera. In Proc. ACM Symp. User Interface Software and Technology, 559–568.
24. Johnson, A., And Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Trans. PAMI. 21, 5, 433–449.
25. Kim, Y. M., Mitra, N. J., Yan, D.-M., and Guibas, L. 2012. Acquiring 3D indoor environments with variability and repetition. ACM Trans. Graph. 31, 6, 138:1–138:11.
26. Kim, B., Xu, S., and Savarese, S. 2013. Accurate localization of 3D objects from RGB-D data using segmentation hypotheses. In Proc. CVPR, 3182–3189.
27. Lai, K., Bo, L., Ren, X., and Fox, D. 2012. Detection-based Object Labeling in 3D Scenes. In Proc. ICRA.
28. Lai, K., Bo, L., and Fox, D. 2014. Unsupervised Feature Learning for 3D Scene Labeling. In Proc. ICRA.
29. Lowe, D. G. 1999. Object recognition from local scale-invariant features. In Proc. ICCV, vol. 2, 1150–1157.
30. Malisiewicz, T., and Efros, A. A. 2009. Beyond categories: The visual memex model for reasoning about object relationships. In NIPS.
31. Merrell, P., Schkufza, E., Li, Z., Agrawala, M., and Koltun, V. 2011. Interactive furniture layout using interior design guidelines. ACM Trans. Graph. 30, 4, 87:1–87:10.
32. Moons, T., van Gool, L., and Vergauwen, M. 2009. 3D Reconstruction from Multiple Images: Part 1: Principles. Now Publishers.
33. Nan, L., Sharf, A., Zhang, H., Cohen-Or, D., and Chen, B. 2010. Smartboxes for interactive urban reconstruction. ACM Trans. Graph. 29, 4, 93:1–93:10.
34. Nan, L., Xie, K., and Sharf, A. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. 31, 6, 137:1–137:10.
35. Oshima, M., and Shirai, Y. 1983. Object recognition using three-dimensional information. IEEE Trans. PAMI., 353–361.
36. Rother, C., Kolmogorov, V., and Blake, A. 2004. “Grab-Cut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3, 309–314.
37. Salas-Moreno, R. F., Newcombe, R. A., Strasdat, H., Kelly, P. H., and Davison, A. J. 2013. SLAM++: Simultaneous localisation and mapping at the level of objects. Proc. CVPR, 1352–1359.
38. Satkin, S., and Hebert, M. 2013. 3DNN: Viewpoint invariant 3D geometry matching for scene understanding. In Proc. ICCV, 1873–1880.
39. Satkin, S., Lin, J., and Hebert, M. 2012. Data-driven scene understanding from 3D models. In Proc. BMVC.
40. Schwing, A. G., Fidler, S., Pollefeys, M., and Urtasun, R. 2013. Box in the box: Joint 3D layout and object reasoning from single images. In Proc. ICCV, 353–360.
41. Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans. Graph. 31, 6, 136:1–136:11.
42. Shen, C.-H., Fu, H., Chen, K., and Hu, S.-M. 2012. Structure recovery by part assembly. ACM Trans. Graph. 31, 6, 180:1–180:11.
43. Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. 2012. Indoor segmentation and support inference from RGBD images. In Proc. ECCV, 746–760.
44. Torralba, A., Murphy, K. P., and Freeman, W. T. 2004. Contextual models for object detection using boosted random fields. In Proc. NIPS, 17.
45. Trevor, A. J. B. 2012. Fast segmentation of organized point cloud data. In Proc. ICRA.
46. Whitaker, R. T., Gregor, J., and Chen, P. F. 1999. Indoor scene reconstruction from sets of noisy range images. In Proc. Intl. Conf. 3-D Digital Imaging and Modeling, 348–357.
47. Xu, K., Zhang, H., Cohen-Or, D., and Chen, B. 2012. Fit and diverse: Set evolution for inspiring 3D shape galleries. ACM Trans. Graph. 31, 4, 57:1–57:10.
48. Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Trans. Graph. 32, 4, 123:1–123:15.
49. Yu, L.-F., Yeung, S.-K., Tang, C.-K., Terzopoulos, D., Chan, T. F., and Osher, S. J. 2011. Make it home: automatic optimization of furniture arrangement. ACM Trans. Graph. 30, 4, 86:1–86:12.
50. Zheng, B., Zhao, Y., Yu, J. C., Ikeuchi, K., and Zhu, S.-C. 2013. Beyond point clouds: Scene understanding by reasoning geometry and physics. In Proc. CVPR, IEEE, 3127–3134.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2014: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information” by Chen, Lai, Wu, Martin and Hu

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: