“Shape2Pose: human-centric shape analysis” by Kim, Chaudhuri, Guibas and Funkhouser

  • ©Vladimir G. Kim, Siddhartha Chaudhuri, Leonidas (Leo) J. Guibas, and Thomas (Tom) A. Funkhouser



Session Title:

    Shape Analysis


    Shape2Pose: human-centric shape analysis




    As 3D acquisition devices and modeling tools become widely available there is a growing need for automatic algorithms that analyze the semantics and functionality of digitized shapes. Most recent research has focused on analyzing geometric structures of shapes. Our work is motivated by the observation that a majority of man-made shapes are designed to be used by people. Thus, in order to fully understand their semantics, one needs to answer a fundamental question: “how do people interact with these objects?” As an initial step towards this goal, we offer a novel algorithm for automatically predicting a static pose that a person would need to adopt in order to use an object. Specifically, given an input 3D shape, the goal of our analysis is to predict a corresponding human pose, including contact points and kinematic parameters. This is especially challenging for man-made objects that commonly exhibit a lot of variance in their geometric structure. We address this challenge by observing that contact points usually share consistent local geometric features related to the anthropometric properties of corresponding parts and that human body is subject to kinematic constraints and priors. Accordingly, our method effectively combines local region classification and global kinematically-constrained search to successfully predict poses for various objects. We also evaluate our algorithm on six diverse collections of 3D polygonal models (chairs, gym equipment, cockpits, carts, bicycles, and bipedal devices) containing a total of 147 models. Finally, we demonstrate that the poses predicted by our algorithm can be used in several shape analysis problems, such as establishing correspondences between objects, detecting salient regions, finding informative viewpoints, and retrieving functionally-similar shapes.


    1. Bard, C., and Troccaz, J. 1990. Automatic preshaping for a dextrous hand from a simple description of objects. Intelligent Robots and Systems, 865–872.Google Scholar
    2. Bohg, J., Morales, A., Asfour, T., and Kragic, D. 2013. Data driven grasp synthesis – a survey. IEEE Transactions on Robotics.Google Scholar
    3. Breiman, L. 2001. Random forests. Mach. Learning 45, 1, 5–32. Google ScholarDigital Library
    4. Buss, S. R. 2005. Introduction to inverse kinematics with jacobian transpose, pseudoinverse and damped least squares methods. Unpublished survey.Google Scholar
    5. Chaudhuri, S., Kalogerakis, E., Guibas, L., and Koltun, V. 2011. Probabilistic reasoning for assembly-based 3D modeling. SIGGRAPH, 35:1–35:10. Google ScholarDigital Library
    6. Chen, X., Saparov, A., Pang, B., and Funkhouser, T. 2012. Schelling points on 3D surface meshes. SIGGRAPH 31, 4. Google ScholarDigital Library
    7. Delaitre, V., Fouhey, D., Laptev, I., Sivic, J., Gupta, A., and Efros, A. 2012. Scene semantics from long-term observation of people. In ECCV. Google ScholarDigital Library
    8. Feix, T., Romero, J., Ek, C., Schmiedmayer, H., and Kragic, D. 2013. A metric for comparing the anthropomorphic motion capability of artificial hands. IEEE Transactions on Robotics 29, 1, 82–93. Google ScholarDigital Library
    9. Fouhey, D. F., Delaitre, V., Gupta, A., Efros, A. A., Laptev, I., and Sivic, J. 2012. People watching: Human actions as a cue for single-view geometry. In ECCV. Google ScholarDigital Library
    10. Fritz, G., Paletta, L., Breithaupt, R., and Rome, E. 2006. Learning predictive features in affordance based robotic perception systems. In Intelligent Robots and Systems, 3642–3647.Google Scholar
    11. Fu, H., Cohen-Or, D., Dror, G., and Sheffer, A. 2008. Upright orientation of man-made objects. SIGGRAPH. Google ScholarDigital Library
    12. Gal, R., and Cohen-Or, D. 2006. Salient geometric features for partial shape matching and similarity. ACM Trans. Graph. Google ScholarDigital Library
    13. Gal, R., Sorkine, O., Mitra, N. J., and Cohen-Or, D. 2009. iWIRES: An analyze-and-edit approach to shape manipulation. SIGGRAPH 28, 3, #33, 1–10. Google ScholarDigital Library
    14. Gibson, J. J. 1977. The theory of affordances. Lawrence Erlbaum.Google Scholar
    15. Goldfeder, C., and Allen, P. K. 2011. Data-driven grasping. Auton. Robots 31, 1, 1–20. Google ScholarDigital Library
    16. Golovinskiy, A., and Funkhouser, T. 2009. Consistent segmentation of 3D models. Proc. SMI 33, 3, 262–269. Google ScholarDigital Library
    17. Grabner, H., Gall, J., and van Gool, L. 2011. What makes a chair a chair? CVPR. Google ScholarDigital Library
    18. Gupta, A., Satkin, S., Efros, A. A., and Hebert, M. 2011. From 3D scene geometry to human workspace. In IEEE CVPR. Google ScholarDigital Library
    19. Hermans, T., Rehg, J. M., and Bobick, A. 2011. Affordance prediction via learned object attributes. ICRA.Google Scholar
    20. Huang, Q., Koltun, V., and Guibas, L. 2011. Joint shape segmentation with linear programming. In SIGGRAPH Asia. Google ScholarDigital Library
    21. Huang, Q.-x., Zhang, G.-X., Gao, L., Hu, S.-M., Butscher, A., and Guibas, L. 2012. An optimization approach for extracting and encoding consistent maps. SIGGRAPH Asia. Google ScholarDigital Library
    22. Huang, Q., Su, H., and Guibas, L. 2013. Fine-grained semi-supervised labeling of large shape collections. SIGGRAPH Asia. Google ScholarDigital Library
    23. Jiang, Y., and Saxena, A. 2012. Hallucinating humans for learning robotic placement of objects. ISER.Google Scholar
    24. Jiang, Y., and Saxena, A. 2013. Infinite latent conditional random fields for modeling environments through humans. RSS.Google Scholar
    25. Jiang, Y., Lim, M., and Saxena, A. 2012. Learning object arrangements in 3D scenes using human context. ICML.Google Scholar
    26. Jiang, Y., Koppula, H. S., and Saxena, A. 2013. Hallucinated humans as the hidden context for labeling 3D scenes. CVPR. Google ScholarDigital Library
    27. Kalogerakis, E., Hertzmann, A., and Singh, K. 2010. Learning 3D mesh segmentation and labeling. In SIGGRAPH. Google ScholarDigital Library
    28. Kalogerakis, E., Chaudhuri, S., Koller, D., and Koltun, V. 2012. A probabilistic model for component-based shape synthesis. SIGGRAPH. Google ScholarDigital Library
    29. Kim, V. G., Li, W., Mitra, N., DiVerdi, S., and Funkhouser, T. 2012. Exploring collections of 3D models using fuzzy correspondences. SIGGRAPH. Google ScholarDigital Library
    30. Kim, V. G., Li, W., Mitra, N. J., Chaudhuri, S., DiVerdi, S., and Funkhouser, T. 2013. Learning part-based templates from large collections of 3D shapes. SIGGRAPH. Google ScholarDigital Library
    31. Lee, C., Varshney, A., and Jacobs, D. 2005. Mesh saliency. SIGGRAPH. Google ScholarDigital Library
    32. Lenz, I., Lee, H., and Saxena, A. 2013. Deep learning for detecting robotic grasps. In RSS.Google Scholar
    33. Mitra, N. J., Pauly, M., Wand, M., and Ceylan, D. 2012. Symmetry in 3D geometry: Extraction and applications. In EUROGRAPHICS State-of-the-art Report.Google Scholar
    34. Mitra, N. J., Wand, M., Zhang, H., Cohen-Or, D., Kim, V., and Huang, Q.-X. 2013. Structure-aware shape processing. In Courses Siggraph Asia. Google ScholarDigital Library
    35. Norman, D. 1988. The Psychology of Everyday Things. Basic Books.Google Scholar
    36. Ovsjanikov, M., Li, W., Guibas, L., and Mitra, N. J. 2011. Exploration of continuous variability in collections of 3D shapes. SIGGRAPH 30, 4, 33:1–33:10. Google ScholarDigital Library
    37. Podolak, J., Shilane, P., Golovinskiy, A., Rusinkiewicz, S., and Funkhouser, T. 2006. A planar-reflective symmetry transform for 3D shapes. ACM Trans. Graph. 25, 3. Google ScholarDigital Library
    38. Pollard, N. S., and Zordan, V. B. 2005. Physically based grasping control from example. SCA. Google ScholarDigital Library
    39. Przybylski, M., Wachter, M., Asfour, T., and Dillmann, R. 2012. A skeleton-based approach to grasp known objects with a humanoid robot. Humanoid Robots.Google Scholar
    40. Rosales, C., Porta, J., and Ros, L. 2011. Global optimization of robotic grasps. RSS.Google Scholar
    41. Saxena, A., Driemeyer, J., Kearns, J., and Ng, A. 2006. Robotic grasping of novel objects. In NIPS.Google Scholar
    42. Saxena, A. 2009. Monocular depth perception and robotic grasping of novel objects. PhD thesis, Stanford University. Google ScholarDigital Library
    43. Secord, A., Lu, C., Finkelstein, A., Singh, M., and Nealen, A. 2011. Perceptual models of viewpoint preference. ACM Trans. Graph. 50, 5. Google ScholarDigital Library
    44. Shapira, L., Shamir, A., and Cohen-Or, D. 2008. Consistent mesh partitioning and skeletonisation using the shape diameter function. Vis. Comput. 24, 4, 249–259. Google ScholarDigital Library
    45. Shilane, P., and Funkhouser, T. 2007. Distinctive regions of 3d surfaces. ACM Trans. Graph. 26, 2 (June). Google ScholarDigital Library
    46. Sidi, O., van Kaick, O., Kleiman, Y., Zhang, H., and Cohen-Or, D. 2011. Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering. SIGGRAPH Asia 30, 6, 126:1–126:9. Google ScholarDigital Library
    47. Stark, M., Lies, P., Zillich, M., Wyatt, J., and Schiele, B. 2008. Functional object class detection based on learned affordance cues. Computer Vision Systems. Google ScholarDigital Library
    48. Sun, J., Moore, J. L., Bobick, A., and Rehg, J. M. 2009. Learning visual object categories for robot affordance prediction. The International Journal of Robotics Research.Google Scholar
    49. Trimble, 2013. Trimble 3D warehouse, http://sketchup.google.com/3dwarehouse/.Google Scholar
    50. van Kaick, O., Xu, K., Zhang, H., Wang, Y., Sun, S., Shamir, A., and Cohen-Or, D. 2013. Co-hierarchical analysis of shape structures. SIGGRAPH 32, 4, 69:1–69:10. Google ScholarDigital Library
    51. Wei, P., Zhao, Y. B., Zheng, N., and Zhu, S. 2013. Modeling 4D human-object interactions for event and object recognition. ICCV. Google ScholarDigital Library
    52. Ying, L., Fu, J. L., and Pollard, N. S. 2007. Data-driven grasp synthesis using shape matching and task-based pruning. Transactions on Visualization and Computer Graphics 13, 4, 732–747. Google ScholarDigital Library
    53. Zhao, W., Zhang, J., Min, J., and Chai, J. 2013. Robust realtime physically based motion control for human grasping. In SIGGRAPH Asia. Google ScholarDigital Library

ACM Digital Library Publication: