“Real-time non-rigid reconstruction using an RGB-D camera” by Zollhöfer, Nießner, Izadi, Rehmann, Zach, et al. …

  • ©Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger



Session Title:

    Depth for All Occasions


    Real-time non-rigid reconstruction using an RGB-D camera




    We present a combined hardware and software solution for markerless reconstruction of non-rigidly deforming physical objects with arbitrary shape in real-time. Our system uses a single self-contained stereo camera unit built from off-the-shelf components and consumer graphics hardware to generate spatio-temporally coherent 3D models at 30 Hz. A new stereo matching algorithm estimates real-time RGB-D data. We start by scanning a smooth template model of the subject as they move rigidly. This geometric surface prior avoids strong scene assumptions, such as a kinematic human skeleton or a parametric shape model. Next, a novel GPU pipeline performs non-rigid registration of live RGB-D data to the smooth template using an extended non-linear as-rigid-as-possible (ARAP) framework. High-frequency details are fused onto the final mesh using a linear deformation model. The system is an order of magnitude faster than state-of-the-art methods, while matching the quality and robustness of many offline algorithms. We show precise real-time reconstructions of diverse scenes, including: large deformations of users’ heads, hands, and upper bodies; fine-scale wrinkles and folds of skin and clothing; and non-rigid interactions performed by users on flexible objects such as toys. We demonstrate how acquired models can be used for many interactive scenarios, including re-texturing, online performance capture and preview, and real-time shape and motion re-targeting.


    1. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG (Proc. SIGGRAPH) 30, 4, 75. Google ScholarDigital Library
    2. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH, 187–194. Google ScholarDigital Library
    3. Bleyer, M., Rhemann, C., and Rother, C. 2011. Patchmatch stereo: Stereo matching with slanted support windows. In Proc. BMVC, vol. 11, 1–11.Google Scholar
    4. Bojsen-Hansen, M., Li, H., and Wojtan, C. 2012. Tracking surfaces with evolving topology. ACM Trans. Graph. 31, 4, 53. Google ScholarDigital Library
    5. Botsch, M., and Sorkine, O. 2008. On linear variational surface deformation methods. IEEE Trans. Vis. Comp. Graph 14, 1, 213–230. Google ScholarDigital Library
    6. Bradley, D., Popa, T., Sheffer, A., Heidrich, W., and Boubekeur, T. 2008. Markerless garment capture. ACM TOG (Proc. SIGGRAPH) 27, 3, 99. Google ScholarDigital Library
    7. Brown, B. J., and Rusinkiewicz, S. 2007. Global non-rigid alignment of 3D scans. ACM TOG 26, 3, 21–30. Google ScholarDigital Library
    8. Cagniart, C., Boyer, E., and Ilic, S. 2010. Free-form mesh tracking: a patch-based approach. In Proc. CVPR.Google Scholar
    9. Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3D shape regression for real-time facial animation. ACM TOG 32, 4, 41. Google ScholarDigital Library
    10. Chen, J., Izadi, S., and Fitzgibbon, A. 2012. Kinêtre: animating the world with the human body. In Proc. UIST, 435–444. Google ScholarDigital Library
    11. de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.-P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM TOG (Proc. SIGGRAPH) 27, 1–10. Google ScholarDigital Library
    12. Dou, M., Fuchs, H., and Frahm, J.-M. 2013. Scanning and tracking dynamic objects with commodity depth cameras. In Proc. ISMAR, 99–106.Google Scholar
    13. Gall, J., Stoll, C., De Aguiar, E., Theobalt, C., Rosenhahn, B., and Seidel, H.-P. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proc. CVPR, 1746–1753.Google Scholar
    14. Garrido, P., Valgaert, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG (Proc. SIGGRAPH Asia) 32, 6, 158. Google ScholarDigital Library
    15. Helten, T., Baak, A., Bharaj, G., Muller, M., Seidel, H.-P., and Theobalt, C. 2013. Personalization and evaluation of a real-time depth-based full body tracker. In Proc. 3DV, 279–286. Google ScholarDigital Library
    16. Hernández, C., Vogiatzis, G., Brostow, G. J., Stenger, B., and Cipolla, R. 2007. Non-rigid photometric stereo with colored lights. In Proc. ICCV, 1–8.Google Scholar
    17. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., and Fitzgibbon, A. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. UIST, 559–568. Google ScholarDigital Library
    18. Kolb, A., Barth, E., Koch, R., and Larsen, R. 2009. Time-of-flight sensors in computer graphics. In Proc. Eurographics State-of-the-art Reports, 119–134.Google Scholar
    19. Li, H., Sumner, R. W., and Pauly, M. 2008. Global correspondence optimization for non-rigid registration of depth scans. In Proc. SGP, Eurographics Association, 1421–1430. Google ScholarDigital Library
    20. Li, H., Adams, B., Guibas, L. J., and Pauly, M. 2009. Robust single-view geometry and motion reconstruction. ACM TOG 28, 5, 175. Google ScholarDigital Library
    21. Li, H., Vouga, E., Gudym, A., Luo, L., Barron, J. T., and Gusev, G. 2013. 3D self-portraits. ACM TOG 32, 6, 187. Google ScholarDigital Library
    22. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics 32, 4 (July). Google ScholarDigital Library
    23. Liao, M., Zhang, Q., Wang, H., Yang, R., and Gong, M. 2009. Modeling deformable objects from a single depth camera. In Proc. ICCV, 167–174.Google Scholar
    24. Mitra, N. J., Flöry, S., Ovsjanikov, M., Gelfand, N., Guibas, L. J., and Pottmann, H. 2007. Dynamic geometry registration. In Proc. SGP, 173–182. Google ScholarDigital Library
    25. Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges, S., and Fitzgibbon, A. 2011. KinectFusion: Real-time dense surface mapping and tracking. In Proc. ISMAR, 127–136. Google ScholarDigital Library
    26. Niessner, M., Zollhöfer, M., Izadi, S., and Stamminger, M. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM TOG 32, 6, 169. Google ScholarDigital Library
    27. Oikonomidis, I., Kyriazis, N., and Argyros, A. A. 2011. Efficient model-based 3D tracking of hand articulations using Kinect. In Proc. BMVC, 1–11.Google Scholar
    28. Pradeep, V., Rhemann, C., Izadi, S., Zach, C., Bleyer, M., and Bathiche, S. 2013. MonoFusion: Real-time 3D reconstruction of small scenes with a single web camera. In Proc. ISMAR, 83–88.Google Scholar
    29. Sorkine, O., and Alexa, M. 2007. As-rigid-as-possible surface modeling. In Proc. SGP, 109–116. Google ScholarDigital Library
    30. Starck, J., and Hilton, A. 2007. Surface capture for performance-based animation. Computer Graphics and Applications 27, 3, 21–31. Google ScholarDigital Library
    31. Sumner, R. W., and Popović, J. 2004. Deformation transfer for triangle meshes. In ACM SIGGRAPH 2004 Papers, ACM, New York, NY, USA, SIGGRAPH ’04, 399–405. Google ScholarDigital Library
    32. Sumner, R. W., Schmid, J., and Pauly, M. 2007. Embedded deformation for shape manipulation. ACM TOG 26, 3, 80. Google ScholarDigital Library
    33. Taylor, J., Shotton, J., Sharp, T., and Fitzgibbon, A. 2012. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Proc. CVPR, 103–110. Google ScholarDigital Library
    34. Tevs, A., Berner, A., Wand, M., Ihrke, I., Bokeloh, M., Kerber, J., and Seidel, H.-P. 2012. Animation cartography-intrinsic reconstruction of shape and motion. ACM TOG 31, 2, 12. Google ScholarDigital Library
    35. Theobalt, C., de Aguiar, E., Stoll, C., Seidel, H.-P., and Thrun, S. 2010. Performance capture from multi-view video. In Image and Geometry Processing for 3D-Cinematography, R. Ronfard and G. Taubin, Eds. Springer, 127ff.Google Scholar
    36. Tong, J., Zhou, J., Liu, L., Pan, Z., and Yan, H. 2012. Scanning 3D full human bodies using Kinects. TVCG 18, 4, 643–650. Google ScholarDigital Library
    37. Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG (Proc. SIGGRAPH Asia) 31, 6 (November), 187. Google ScholarDigital Library
    38. Vlasic, D., Baran, I., Matusik, W., and Popović, J. 2008. Articulated mesh animation from multi-view silhouettes. ACM TOG (Proc. SIGGRAPH). Google ScholarDigital Library
    39. Vlasic, D., Peers, P., Baran, I., Debevec, P., Popovic, J., Rusinkiewicz, S., and Matusik, W. 2009. Dynamic shape capture using multi-view photometric stereo. ACM TOG (Proc. SIGGRAPH Asia) 28, 5, 174. Google ScholarDigital Library
    40. Wand, M., Adams, B., Ovsjanikov, M., Berner, A., Bokeloh, M., Jenke, P., Guibas, L., Seidel, H.-P., and Schilling, A. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM TOG 28, 15. Google ScholarDigital Library
    41. Waschbüsch, M., Würmlin, S., Cotting, D., Sadlo, F., and Gross, M. 2005. Scalable 3D video of dynamic scenes. In Proc. Pacific Graphics, 629–638.Google Scholar
    42. Weber, D., Bender, J., Schnoes, M., Stork, A., and Fellner, D. 2013. Efficient gpu data structures and methods to solve sparse linear systems in dynamics applications. Computer Graphics Forum 32, 1, 16–26.Google ScholarCross Ref
    43. Wei, X., Zhang, P., and Chai, J. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM TOG 31, 6 (Nov.), 188. Google ScholarDigital Library
    44. Weise, T., Wismer, T., Leibe, B., and Gool, L. V. 2009. In-hand scanning with online loop closure. In IEEE International Workshop on 3-D Digital Imaging and Modeling.Google Scholar
    45. Weise, T., Li, H., Gool, L. V., and Pauly, M. 2009. Face/off: Live facial puppetry. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation (Proc. SCA’09), Eurographics Association, ETH Zurich. Google ScholarDigital Library
    46. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 4, 77. Google ScholarDigital Library
    47. Weiss, A., Hirshberg, D., and Black, M. J. 2011. Home 3D body scans from noisy image and range data. In Proc. ICCV, 1951–1958. Google ScholarDigital Library
    48. White, B. S., McKee, S. A., de Supinski, B. R., Miller, B., Quinlan, D., and Schulz, M. 2005. Improving the computational intensity of unstructured mesh applications. In Proc. ACM Intl. Conf. on Supercomputing, 341–350. Google ScholarDigital Library
    49. Wilamowski, B. M., and Yu, H. 2010. Improved computation for levenberg-marquardt training. IEEE Trans. Neural Networks 21, 6, 930–937. Google ScholarDigital Library
    50. Wu, C., Stoll, C., Valgaerts, L., and Theobalt, C. 2013. On-set performance capture of multiple actors with a stereo camera. ACM TOG 32, 6, 161. Google ScholarDigital Library
    51. Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., and Theobalt, C. 2012. Performance capture of interacting characters with handheld kinects. In Proc. ECCV. Springer, 828–841. Google ScholarDigital Library
    52. Zeng, M., Zheng, J., Cheng, X., and Liu, X. 2013. Templateless quasi-rigid shape modeling with implicit loop-closure. In Proc. CVPR, 145–152. Google ScholarDigital Library

ACM Digital Library Publication: