An interactive approach to semantic modeling of indoor scenes with an RGBD camera

We present an interactive approach to semantic modeling of indoor scenes with a consumer-level RGBD camera. Using our approach, the user first takes an RGBD image of an indoor scene, which is automatically segmented into a set of regions with semantic labels. If the segmentation is not satisfactory, the user can draw some strokes to guide the algorithm to achieve better results. After the segmentation is finished, the depth data of each semantic region is used to retrieve a matching 3D model from a database. Each model is then transformed according to the image depth to yield the scene. For large scenes where a single image can only cover one part of the scene, the user can take multiple images to construct other parts of the scene. The 3D models built for all images are then transformed and unified into a complete scene. We demonstrate the efficiency and robustness of our approach by modeling several real-world scenes.

References:

1. Anand, A., Koppula, H. S., Joachims, T., and Saxena, A. 2011. Contextually guided semantic labeling and search for 3d point clouds. CoRR abs/1111.5358.
2. Blum, M., Springenberg, J. T., Wulfing, J., and Riedmiller, M. 2011. On the applicability of unsupervised feature learning for object recognition in rgb-d data. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
3. Bo, L., Ren, X., and Fox, D. 2011. Depth kernel descriptors for object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
4. Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23, 11 (Nov.), 1222–1239.
5. Breiman, L. 2001. Random forests. Mach. Learn. 45, 1 (Oct.), 5–32.
6. Bronstein, A. M., Bronstein, M. M., Guibas, L. J., and Ovsjanikov, M. 2011. Shape google: Geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. 30, 1 (Feb.), 1:1–1:20.
7. Clark, J. 2011. Object digitization for everyone. IEEE Computer 44, 10 (Oct.), 81–83.
8. Dietterich, T. G. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40, 2 (Aug.), 139–157.
9. Du, H., Henry, P., Ren, X., Cheng, M., Goldman, D. B., Seitz, S. M., and Fox, D. 2011. Interactive 3d modeling of indoor environments with a consumer depth camera. In Proceedings of Ubicomp, 75–84.
10. Fanelli, G., Weise, T., Gall, J., and Gool, L. V. 2011. Real time head pose estimation from consumer depth cameras. In Proceedings of the 33rd international conference on Pattern recognition, Springer-Verlag, Berlin, Heidelberg, DAGM’11, 101–110.
11. Fox, D., Burgard, W., Dellaert, F., and Thrun, S. 1999. Monte carlo localization: efficient position estimation for mobile robots. In Proceedings of AAAI’99/IAAI’99, American Association for Artificial Intelligence, Menlo Park, CA, USA, AAAI ’99/IAAI ’99, 343–349.
12. Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D., and Jacobs, D. 2003. A search engine for 3d models. ACM Trans. Graph. 22, 1 (Jan.), 83–105.
13. Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8 (Aug.), 1362–1376.
14. Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Manhattan-world stereo. CVPR, 1422–1429.
15. Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Reconstructing Building Interiors from Images. In ICCV.
16. Golovinskiy, A., Kim, V., and Funkhouser, T. 2009. Shape-based recognition of 3d point clouds in urban environments. In ICCV.
17. Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, second ed. Cambridge University Press.
18. Henry, P., Krainin, M., Herbst, E., Ren, X., and Fox, D. 2012. Rgbd mapping: Using depth cameras for dense 3d modeling of indoor environments. Internatioal Journal of Robotics Research 31, 5, 647–663.
19. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., and Fitzgibbon, A. 2011. Kinect-fusion: real-time 3d reconstruction and interaction using a moving depth camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology, ACM, New York, NY, USA, UIST ’11, 559–568.
20. Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., and Darrell, T. 2011. A category-level 3-d object dataset: Putting the kinect to work. In ICCV Workshop on Consumer Depth Cameras for Computer Vision, 1168–1174.
21. Johnson, A. E., and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 21, 5 (May), 433–449.
22. Kim, Y. M., Mitra, N. J., Yan, D. M., and Guibas, L. 2012. Acquiring 3d indoor environments with variability and repetition. To appear in SIGGRAPH Asia 2012.
23. Koppula, H., Anand, A., Joachims, T., and Saxena, A. 2011. Semantic labeling of 3d point clouds for indoor scenes. In NIPS.
24. Lai, K., and Fox, D. 2010. Object recognition in 3d point clouds using web data and domain adaptation. International Journal of Robotics Research 29, 8 (July), 1019–1037.
25. Lai, K., Bo, L., Ren, X., and Fox, D. 2011. A large-scale hierarchical multi-view rgb-d object dataset. In Proc. of IEEE International Conference on Robotics and Automation.
26. Li, Y., Sun, J., Tang, C.-K., and Shum, H.-Y. 2004. Lazy snapping. ACM Trans. Graph. 23, 3 (Aug.), 303–308.
27. Li, Y., Wu, X., Chrysathou, Y., Sharf, A., Cohen-Or, D., and Mitra, N. J. 2011. Globfit: consistently fitting primitives by discovering global relations. ACM Trans. Graph. 30, 4 (Aug.), 52:1–52:12.
28. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2 (Nov.), 91–110.
29. Merrell, P., Schkufza, E., Li, Z., Agrawala, M., and Koltun, V. 2011. Interactive furniture layout using interior design guidelines. ACM Trans. Graph. 30 (July), 87:1–87:12.
30. Nan, L., Xie, K., and Sharf, A. 2012. A search-classify approach for cluttered indoor scene understanding. To appear in SIGGRAPH Asia 2012.
31. Schnabel, R., Wahl, R., and Klein, R. 2007. Efficient ransac for point-cloud shape detection. Computer Graphics Forum 26, 2 (June), 214–226.
32. Silberman, N., and Fergus, R. 2011. Indoor scene segmentation using a structured light sensor. In Proceedings of the International Conference on Computer Vision – Workshop on 3D Representation and Recognition.
33. Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. 2012. Indoor segmentation and support inference from rgbd images. In ECCV.
34. Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. 25, 3 (July), 835–846.
35. Tangelder, J. W., and Veltkamp, R. C. 2008. A survey of content based 3d shape retrieval methods. Multimedia Tools Appl. 39, 3 (Sept.), 441–471.
36. Triggs, B., Mclauchlan, P., Hartley, R., and Fitzgibbon, A. 2000. Bundle adjustment a modern synthesis. In Vision Algorithms: Theory and Practice, LNCS, Springer Verlag, 298–375.
37. Whitaker, R. T., Gregor, J., and Chen, P. F. 1999. Indoor scene reconstruction from sets of noisy range images. In Proceedings of the 2nd international conference on 3-D digital imaging and modeling, 3DIM’99, 348–357.
38. Xiong, X., and Huber, D. 2010. Using context to create semantic 3d models of indoor environments. In BMVC, 1–11.
39. Yu, L.-F., Yeung, S.-K., Tang, C.-K., Terzopoulos, D., Chan, T. F., and Osher, S. J. 2011. Make it home: automatic optimization of furniture arrangement. ACM Trans. Graph. 30 (July), 86:1–86:12.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2012: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“An interactive approach to semantic modeling of indoor scenes with an RGBD camera” by Shao, Xu, Zhou, Wang, Li, et al. …

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: