Activity-centric scene synthesis for functional 3D scene modeling

We present a novel method to generate 3D scenes that allow the same activities as real environments captured through noisy and incomplete 3D scans. As robust object detection and instance retrieval from low-quality depth data is challenging, our algorithm aims to model semantically-correct rather than geometrically-accurate object arrangements. Our core contribution is a new scene synthesis technique which, conditioned on a coarse geometric scene representation, models functionally similar scenes using prior knowledge learned from a scene database. The key insight underlying our scene synthesis approach is that many real-world environments are structured to facilitate specific human activities, such as sleeping or eating. We represent scene functionalities through virtual agents that associate object arrangements with the activities for which they are typically used. When modeling a scene, we first identify the activities supported by a scanned environment. We then determine semantically-plausible arrangements of virtual objects — retrieved from a shape database — constrained by the observed scene geometry. For a given 3D scan, our algorithm produces a variety of synthesized scenes which support the activities of the captured real environments. In a perceptual evaluation study, we demonstrate that our results are judged to be visually appealing and functionally comparable to manually designed scenes.

References:

1. Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM Transactions on Graphics (TOG) 33, 6, 208.
2. Drost, B., and Ilic, S. 2012. 3d object detection and localization using multimodal point pair features. In Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), IEEE, 9–16.
3. Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM Transactions on Graphics (TOG), vol. 30, ACM, 34.
4. Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3D object arrangements. ACM Transactions on Graphics (TOG) 31, 6, 135.
5. Galleguillos, C., and Belongie, S. 2010. Context based object categorization: A critical survey. Computer Vision and Image Understanding 114, 6, 712–722.
6. Gibson, J. 1977. The concept of affordances. Perceiving, acting, and knowing.
7. Grabner, H., Gall, J., and Van Gool, L. 2011. What makes a chair a chair? In CVPR.
8. Jiang, Y., and Saxena, A. 2013. Hallucinating humans for learning robotic placement of objects. In Experimental Robotics, Springer, 921–937.
9. Jiang, Y., Lim, M., and Saxena, A. 2012. Learning object arrangements in 3d scenes using human context. arXiv preprint arXiv:1206.6462.
10. Johnson, A. E., and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5, 433–449.
11. Kim, Y. M., Mitra, N. J., Yan, D.-M., and Guibas, L. 2012. Acquiring 3d indoor environments with variability and repetition. ACM Transactions on Graphics (TOG) 31, 6, 138.
12. Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics (TOG) 33, 4, 120.
13. Koppula, H., Gupta, R., and Saxena, A. 2013. Learning human activities and object affordances from RGB-D videos. IJRR.
14. Li, Y., Dai, A., Guibas, L., and Niessner, M. 2015. Database-assisted object retrieval for real-time 3d reconstruction. Computer Graphics Forum 34, 2.
15. Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (TOG) 33, 6.
16. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91–110.
17. Margaritis, D. 2003. Learning Bayesian network model structure from data. PhD thesis, University of Pittsburgh.
18. Mattausch, O., Panozzo, D., Mura, C., Sorkine-Hornung, O., and Pajarola, R. 2014. Object detection and classification from large-scale cluttered indoor scans. Computer Graphics Forum 33, 2, 11–21.
19. Nan, L., Xie, K., and Sharf, A. 2012. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics (TOG) 31, 6, 137.
20. Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics (TOG) 32, 6, 169.
21. Satkin, S., and Hebert, M. 2013. 3dnn: Viewpoint invariant 3d geometry matching for scene understanding. In Computer Vision (ICCV), 2013 IEEE International Conference on, IEEE, 1873–1880.
22. Savva, M., Chang, A. X., Hanrahan, P., Fisher, M., and Niessner, M. 2014. Scenegrok: Inferring action maps in 3D environments. ACM Transactions on Graphics (TOG) 33, 6.
23. Shao, T., Xu, W., Zhou, K., Wang, J., Li, D., and Guo, B. 2012. An interactive approach to semantic modeling of indoor scenes with an rgbd camera. ACM Transactions on Graphics (TOG) 31, 6, 136.
24. Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. J. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM Transactions on Graphics (TOG) 33, 6, 209.
25. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. 2013. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1, 116–124.
26. Tversky, B., and Hard, B. M. 2009. Embodied and disembodied cognition: Spatial perspective-taking. Cognition 110, 1, 124–129.
27. Wei, P., Zhao, Y., Zheng, N., and Zhu, S.-C. 2013. Modeling 4D human-object interactions for event and object recognition. In ICCV.
28. Wei, P., Zheng, N., Zhao, Y., and Zhu, S.-C. 2013. Concurrent action detection with structural prediction. In ICCV.
29. Xu, K., Zhang, H., Cohen-Or, D., and Chen, B. 2012. Fit and diverse: set evolution for inspiring 3d shape galleries. ACM Transactions on Graphics (TOG) 31, 4, 57.
30. Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM Transactions on Graphics (TOG) 32, 4, 123.
31. Xu, K., Ma, R., Zhang, H., Zhu, C., Shamir, A., Cohen-Or, D., and Huang, H. 2014. Organizing heterogeneous scene collections through contextual focal points. ACM Transactions on Graphics (TOG) 33, 4, 35.
32. Yeh, Y.-T., Yang, L., Watson, M., Goodman, N. D., and Hanrahan, P. 2012. Synthesizing open worlds with constraints using locally annealed reversible jump mcmc. ACM Transactions on Graphics (TOG) 31, 4, 56.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2015: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Activity-centric scene synthesis for functional 3D scene modeling” by Fisher, Savva, Li, Hanrahan and Nießner

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: