“Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera” by Guo, Xu, Yu, Liu, Dai, et al. …

  • ©Kaiwen Guo, Feng Xu, Tao Yu, Xiaoyang Liu, Qionghai Dai, and Yebin Liu



Session Title:

    Get More Out of Your Photo


    Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera




    This article proposes a real-time method that uses a single-view RGB-D input (a depth sensor integrated with a color camera) to simultaneously reconstruct a casual scene with a detailed geometry model, surface albedo, per-frame non-rigid motion, and per-frame low-frequency lighting, without requiring any template or motion priors. The key observation is that accurate scene motion can be used to integrate temporal information to recover the precise appearance, whereas the intrinsic appearance can help to establish true correspondence in the temporal domain to recover motion. Based on this observation, we first propose a shading-based scheme to leverage appearance information for motion estimation. Then, using the reconstructed motion, a volumetric albedo fusing scheme is proposed to complete and refine the intrinsic appearance of the scene by incorporating information from multiple frames. Since the two schemes are iteratively applied during recording, the reconstructed appearance and motion become increasingly more accurate. In addition to the reconstruction results, our experiments also show that additional applications can be achieved, such as relighting, albedo editing, and free-viewpoint rendering of a dynamic scene, since geometry, appearance, and motion are all reconstructed by our technique.


    1. Derek Bradley, Tiberiu Popa, Alla Sheffer, Wolfgang Heidrich, and Tamy Boubekeur. 2008. Markerless garment capture. ACM Transactions on Graphics 27, 3 (2008), 99.Google ScholarDigital Library
    2. Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In CVPR. IEEE, 1339–1346. Google ScholarCross Ref
    3. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4 (2015), 69. Google ScholarDigital Library
    4. Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. ACM, New York, NY, 303–312. DOI:https://doi.org/10.1145/237170.237269Google Scholar
    5. Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3 (2008), 98. Google ScholarDigital Library
    6. Julie Dorsey, Holly Rushmeier, and François Sillion. 2010. Digital Modeling of Material Appearance. Morgan Kaufmann.Google Scholar
    7. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4 (2016), 114. Google ScholarDigital Library
    8. Mingsong Dou, Jonathan Taylor, Henry Fuchs, Andrew Fitzgibbon, and Shahram Izadi. 2015. 3D scanning deformable objects with a single RGB-D sensor. In CVPR. 493–501.Google Scholar
    9. Per Einarsson, Charles-Felix Chabert, Andrew Jones, Wan-Chun Ma, Bruce Lamond, Tim Hawkins, Mark T. Bolas, Sebastian Sylwan, and Paul E. Debevec. 2006. Relighting human locomotion with flowed reflectance fields. Rendering Techniques 2006 (2006), Vol. 17.Google Scholar
    10. Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using l0 regularization. In ICCV. 3083–3091. Google ScholarDigital Library
    11. Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2017. Robust non-rigid motion tracking and surface reconstruction using L0 regularization. IEEE Transactions on Visualization and Computer Graphics PP, 99 (2017), 1–1.Google Scholar
    12. Samuel W. Hasinoff, Anat Levin, Philip R. Goode, and William T. Freeman. 2011. Diffuse reflectance imaging with astronomical applications. In ICCV. IEEE, 185–192. Google ScholarDigital Library
    13. James Imber, Jean-Yves Guillemaut, and Adrian Hilton. 2014. Intrinsic textures for relightable free-viewpoint video. In ECCV. Springer, 392–407. Google ScholarCross Ref
    14. Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time volumetric non-rigid reconstruction. In ECCV.Google Scholar
    15. Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic studio: A massively multiview system for social motion capture. In ICCV. 3334–3342. Google ScholarDigital Library
    16. Guannan Li, Chenglei Wu, Carsten Stoll, Yebin Liu, Kiran Varanasi, Qionghai Dai, and Christian Theobalt. 2013b. Capturing relightable human performances under general uncontrolled illumination. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 275–284. Google ScholarCross Ref
    17. Hao Li, Bart Adams, Leonidas J Guibas, and Mark Pauly. 2009. Robust single-view geometry and motion reconstruction. ACM Transactions on Graphics 28, 5 (2009), 175.Google ScholarDigital Library
    18. Hao Li, Etienne Vouga, Anton Gudym, Linjie Luo, Jonathan T Barron, and Gleb Gusev. 2013a. 3D self-portraits. ACM Trans. Graph. 32, 6 (2013), 187.Google ScholarDigital Library
    19. Miao Liao, Qing Zhang, Huamin Wang, Ruigang Yang, and Minglun Gong. 2009. Modeling deformable objects from a single depth camera. In ICCV. Google ScholarCross Ref
    20. Yebin Liu, Qionghai Dai, and Wenli Xu. 2010. A point-cloud-based multiview stereo algorithm for free-viewpoint video. IEEE Trans. Vis. Comput. Graph. 16, 3 (2010), 407–418. Google ScholarDigital Library
    21. Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans-Peter Seidel, and Christian Theobalt. 2013. Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (2013), 2720–2735. Google ScholarDigital Library
    22. Richard M. Murray, Zexiang Li, S. Shankar Sastry, and S Shankara Sastry. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press.Google Scholar
    23. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR.Google Scholar
    24. Björn Nutti, Åsa Kronander, Mattias Nilsing, Kristofer Maad, Cristina Svensson, and Hao Li. 2014. Depth sensor-based realtime tumor tracking for accurate radiation therapy. In Eurographics (Short Papers). Citeseer, 1–4.Google Scholar
    25. Roy Or-El, Guy Rosman, Aaron Wetzler, Ron Kimmel, and Alfred M Bruckstein. 2015. Rgbd-fusion: Real-time high precision depth recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5407–5416.Google Scholar
    26. Ravi Ramamoorthi and Pat Hanrahan. 2001. A signal-processing framework for inverse rendering. In SIGGRAPH. ACM, 117–128. Google ScholarDigital Library
    27. Szymon Rusinkiewicz and Marc Levoy. 2001. Efficient variants of the ICP algorithm. In Proceedings of the 3rd International Conference on 3-D Digital Imaging and Modeling, 2001. IEEE, 145–152. Google ScholarCross Ref
    28. Andrei Sharf, Dan A. Alcantara, Thomas Lewiner, Chen Greif, Alla Sheffer, Nina Amenta, and Daniel Cohen-Or. 2008. Space-time surface reconstruction using incompressible flow. ACM Transactions on Graphics 27, 5 (2008), 110.Google ScholarDigital Library
    29. Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. Comput. Graph. Appl. 27, 3 (2007), 21–31. Google ScholarDigital Library
    30. Sima Taheri, Aswin C. Sankaranarayanan, and Rama Chellappa. 2013. Joint albedo estimation and pose tracking from video. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (2013), 1674–1689. Google ScholarDigital Library
    31. Christian Theobalt, Naveed Ahmed, Hendrik Lensch, Marcus Magnor, and Hans-Peter Seidel. 2007. Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Trans. Vis. Comput. Graph. 13, 4 (2007), 663–674. Google ScholarDigital Library
    32. Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Transac. Graph. 34, 6 (2015), 183. Google ScholarDigital Library
    33. Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3 (2008), 97.Google ScholarDigital Library
    34. Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Transactions on Graphics 28, 5 (2009), 174.Google ScholarDigital Library
    35. Michael Wand, Bart Adams, Maksim Ovsjanikov, Alexander Berner, Martin Bokeloh, Philipp Jenke, Leonidas Guibas, Hans-Peter Seidel, and Andreas Schilling. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM Trans. Graph. 28, 2 (2009), 15. Google ScholarDigital Library
    36. Daniel Weber, Jan Bender, Markus Schnoes, Andre Stork, and Dieter W. Fellner. 2013. Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput. Graph. Forum 32, 1 (2013), 16–26. DOI:https://doi.org/10.1111/j.1467-8659.2012.03227.xGoogle ScholarCross Ref
    37. Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. ACM Trans. Graph. 32, 6 (2013), 161. Google ScholarDigital Library
    38. Chenglei Wu, Kiran Varanasi, Yebin Liu, Hans-Peter Seidel, and Christian Theobalt. 2011. Shading-based dynamic shape refinement from multi-view video under general illumination. In ICCV. IEEE, 1108–1115. Google ScholarDigital Library
    39. Chenglei Wu, Kiran Varanasi, and Christian Theobalt. 2012. Full body performance capture under uncontrolled and varying illumination: A shading-based approach. In ECCV. Springer, 757–770. Google ScholarDigital Library
    40. Chenglei Wu, Michael Zollhöfer, Matthias Nießner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. 33, 6 (2014), 200:1–200:10.Google ScholarDigital Library
    41. H. Wu, Z. Wang, and K. Zhou. 2016. Simultaneous localization and appearance estimation with a consumer RGB-D camera. IEEE Trans. Vis. Comput. Graph. 22, 8 (2016), 2012–2023. DOI:https://doi.org/10.1109/TVCG.2015.2498617Google ScholarDigital Library
    42. Hongzhi Wu and Kun Zhou. 2015. AppFusion: Interactive appearance acquisition using a kinect sensor. In Computer Graphics Forum, Vol. 34. 289–298. Google ScholarDigital Library
    43. Zhe Wu, Sai-Kit Yeung, and Ping Tan. 2016. Towards building an RGBD-M scanner. arXiv Preprint arXiv:1603.03875 (2016).Google Scholar
    44. Genzhi Ye, Yebin Liu, Nils Hasler, Xiangyang Ji, Qionghai Dai, and Christian Theobalt. 2012. Performance capture of interacting characters with handheld kinects. In ECCV. Google ScholarDigital Library
    45. Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In CVPR. IEEE, 2353–2360. Google ScholarDigital Library
    46. Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In CVPR. IEEE, 676–683. Google ScholarDigital Library
    47. Michael Zollhöfer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Trans. Graph. 34, 4 (July 2015), Article 96, 14 pages. DOI:https://doi.org/10.1145/2766887Google ScholarDigital Library
    48. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and others. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4 (2014), 156. Google ScholarDigital Library

ACM Digital Library Publication: