“BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration”

  • ©Angela Dai, Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Christian Theobalt

Conference:


Type:


Title:

    BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration

Session/Category Title: Reconstructing 3D Surfaces From Points, Lines, Images & Water


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes to perform online correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation, resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real time to ensure global consistency, all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.1

References:


    1. S. Agarwal, K. Mierle, and Others. 2013. Ceres Solver. Retrieved from http://ceres-solver.org. (2013).Google Scholar
    2. P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Trans. PAMI 14, 2 (1992), 239–256. Google ScholarDigital Library
    3. J. Chen, D. Bautembach, and S. Izadi. 2013. Scalable real-time volumetric surface reconstruction. ACM TOG 32, 4 (2013), 113.Google ScholarDigital Library
    4. S. Choi, Q.-Y. Zhou, and V. Koltun. 2015. Robust reconstruction of indoor scenes. In Proc. CVPR. 5556–5565.Google Scholar
    5. B. Curless and M. Levoy. 1996. A volumetric method for building complex models from range images. In Proc. SIGGRAPH. ACM, 303–312. Google ScholarDigital Library
    6. Z. DeVito, M. Mara, M. Zollhöfer, G. Bernstein, J. Ragan-Kelley, C. Theobalt, P. Hanrahan, M. Fisher, and M. Nießner. 2016. Opt: A domain specific language for non-linear least squares optimization in graphics and imaging. arXiv Preprint arXiv:1604.06525 (2016).Google Scholar
    7. A. Elfes and L. Matthies. 1987. Sensor integration for robot navigation: Combining sonar and stereo range data in a grid-based representataion. In 26th IEEE Conference on Decision and Control, 1987, Vol. 26. IEEE, 1802–1807. Google ScholarCross Ref
    8. F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. 2012. An evaluation of the RGB-D SLAM system. In Proc. ICRA. IEEE, 1691–1696. Google ScholarCross Ref
    9. J. Engel, T. Schöps, and D. Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision. Google ScholarCross Ref
    10. J. Engel, J. Sturm, and D. Cremers. 2013. Semi-dense visual odometry for a monocular camera. In Proc. ICCV. IEEE, 1449–1456. Google ScholarDigital Library
    11. N. Fioraio, J. Taylor, A. Fitzgibbon, L. Di Stefano, and S. Izadi. 2015. Large-scale and drift-free surface reconstruction using online subvolume registration. Proc. CVPR (June 2015). Google ScholarCross Ref
    12. C. Forster, M. Pizzoli, and D. Scaramuzza. 2014. SVO: Fast semi-direct monocular visual odometry. In Proc. ICRA. IEEE, 15–22. Google ScholarCross Ref
    13. S. Fuhrmann and M. Goesele. 2014. Floating scale surface reconstruction. In Proc. SIGGRAPH. Google ScholarDigital Library
    14. D. Gallup, M. Pollefeys, and J.-M. Frahm. 2010. 3D reconstruction using an n-layer heightmap. In Pattern Recognition. Springer, 1–10. Google ScholarCross Ref
    15. B. Glocker, J. Shotton, A. Criminisi, and S. Izadi. 2015. Real-time RGB-D camera relocalization via randomized ferns for keyframe encoding. TVCG 21, 5 (2015), 571–583. Google ScholarCross Ref
    16. J. C. Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51. Google ScholarCross Ref
    17. A. Handa, T. Whelan, J. B. McDonald, and A. J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In Proc. ICRA. Google ScholarCross Ref
    18. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. 2010. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Proc. Int. Symp. Experimental Robotics, Vol. 20. 22–25.Google Scholar
    19. A. Hilton, A. Stoddart, J. Illingworth, and T. Windeatt. 1996. Reliable surface reconstruction from multiple range images. J. Proc. ECCV 1 (1996), 117–126. Google ScholarCross Ref
    20. S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. UIST. 559–568. Google ScholarDigital Library
    21. W. Kabsch. 1976. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A: Crystal Phys. Diffract. Theoret. General Crystallogr. 32, 5 (1976), 922–923. Google ScholarCross Ref
    22. M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb. 2013. Real-time 3D reconstruction in dynamic scenes using point-based fusion. In Proc. 3DV. IEEE, 1–8. Google ScholarDigital Library
    23. C. Kerl, J. Sturm, and D. Cremers. 2013. Dense visual SLAM for RGB-D cameras. In Proc. IROS. Google ScholarCross Ref
    24. G. Klein and D. Murray. 2007. Parallel tracking and mapping for small AR workspaces. In Proc. ISMAR. Google ScholarDigital Library
    25. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. 2011. g 2 o: A general framework for graph optimization. In Proc. ICRA. IEEE, 3607–3613.Google Scholar
    26. M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginzton, S. Anderson, J. Davis, J. Ginsberg, and D. Fulk. 2000. The digital michelangelo project: 3D scanning of large statues. In Proc. SIGGRAPH. ACM Press/Addison-Wesley Publishing Co., 131–144. Google ScholarDigital Library
    27. H. Li, E. Vouga, A. Gudym, L. Luo, J. T. Barron, and G. Gusev. 2013. 3D self-portraits. ACM TOG 32, 6 (2013), 187.Google ScholarDigital Library
    28. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60 (2004), 91–110. Google ScholarDigital Library
    29. R. Maier, J. Sturm, and D. Cremers. 2014. Submap-based bundle adjustment for 3D reconstruction from RGB-D data. In Proc. GCPR. Google ScholarCross Ref
    30. M. Meilland and A. Comport. 2013. On unifying key-frame and voxel-based dense visual SLAM at large scales. In Proc. IROS. IEEE, 3677–3683. Google ScholarCross Ref
    31. M. Meilland, A. Comport, P. Rives, and I. S. Antipolis Méditerranée. 2011. Real-time dense visual tracking under large lighting variations. In Proc. BMVC, Vol. 29. Google ScholarCross Ref
    32. P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J. M. Frahm, R. Yang, D. Nistér, and M. Pollefeys. 2007. Real-time visibility-based fusion of depth maps. In Proc. ICCV. 1–8. Google ScholarCross Ref
    33. R. M. Murray, S. S. Sastry, and L. Zexiang. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press.Google Scholar
    34. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. 2011a. KinectFusion: Real-time dense surface mapping and tracking. In Proc. ISMAR. 127–136. Google ScholarDigital Library
    35. R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. 2011b. DTAM: Dense tracking and mapping in real-time. In Proc. ICCV. 2320–2327. Google ScholarDigital Library
    36. M. Nießner, A. Dai, and M. Fisher. 2014. Combining inertial navigation and ICP for real-time 3d surface reconstruction. In Eurographics (Short Papers). 13–16.Google Scholar
    37. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM TOG 32, 6 (2013), 169.Google ScholarDigital Library
    38. V. Pradeep, C. Rhemann, S. Izadi, C. Zach, M. Bleyer, and S. Bathiche. 2013. MonoFusion: Real-time 3d reconstruction of small scenes with a single web camera. In Proc. ISMAR. 83–88. Google ScholarCross Ref
    39. F. Reichl, J. Weiss, and R. Westermann. 2015. Memory-efficient interactive online reconstruction from depth image streams. In Computer Graphics Forum. Wiley Online Library.Google Scholar
    40. H. Roth and M. Vona. 2012. Moving volume kinectfusion. In Proc. BMVC. Google ScholarCross Ref
    41. S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. 2002. Real-time 3D model acquisition. ACM TOG 21, 3 (2002), 438–446. Google ScholarDigital Library
    42. S. Rusinkiewicz and M. Levoy. 2001. Efficient variants of the ICP algorithm. In Proc. 3DIM. 145–152. Google ScholarCross Ref
    43. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. Google ScholarDigital Library
    44. F. Steinbruecker, C. Kerl, J. Sturm, and D. Cremers. 2013. Large-scale multi-resolution surface reconstruction from RGB-D sequences. In Proc. ICCV. Google ScholarDigital Library
    45. F. Steinbruecker, J. Sturm, and D. Cremers. 2014. Volumetric 3D mapping in real-time on a CPU. In 2014 IEEE International Conference on Robotics and Automation (ICRA’14). Google ScholarCross Ref
    46. J. Stückler and S. Behnke. 2014. Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Visual Communication Image Representation 25, 1 (2014), 137–147. Google ScholarDigital Library
    47. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proc. IROS. Google ScholarCross Ref
    48. B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon. 2000. Bundle adjustment, a modern synthesis. In Vision Algorithms: Theory and Practice. Springer, 298–372. Google ScholarCross Ref
    49. J. Valentin, M. Nießner, J. Shotton, A. Fitzgibbon, S. Izadi, and P. Torr. 2015. Exploiting uncertainty in regression forests for accurate camera relocalization. In Proc. CVPR. 4400–4408. Google ScholarCross Ref
    50. T. Weise, T. Wismer, B. Leibe, and L. Van Gool. 2009. In-hand scanning with online loop closure. In Proc. ICCV Workshops. 1630–1637. Google ScholarCross Ref
    51. T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. 2012. Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous. Technical Report. Query date: 10-25-2012.Google Scholar
    52. T. Whelan, H. Johannsson, M. Kaess, J.J. Leonard, and J. McDonald. 2013a. Robust real-time visual odometry for dense RGB-D mapping. In Proc. ICRA. Google ScholarCross Ref
    53. T. Whelan, M. Kaess, J. J. Leonard, and J. McDonald. 2013b. Deformation-based loop closure for large scale dense RGB-D SLAM. In Proc. IROS. IEEE, 548–555. Google ScholarCross Ref
    54. T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A. J. Davison. 2015. ElasticFusion: Dense SLAM without a pose graph. In Proc. RSS. Rome, Italy. Google ScholarCross Ref
    55. C. Wu, M. Zollhöfer, M. Nießner, M. Stamminger, S. Izadi, and C. Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6 (2014), 200.Google ScholarDigital Library
    56. K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard. 2010. OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proc. ICRA, Vol. 2.Google Scholar
    57. J. Xiao, A. Owens, and A. Torralba. 2013. SUN3D: A database of big spaces reconstructed using sfm and object labels. In Proc. ICCV. IEEE, 1625–1632. Google ScholarDigital Library
    58. M. Zeng, F. Zhao, J. Zheng, and X. Liu. 2012. Octree-based fusion for realtime 3d reconstruction. Graph. Models 75, 3 (2012), 126–136. Google ScholarDigital Library
    59. Y. Zhang, W. Xu, Y. Tong, and K. Zhou. 2015. Online structure analysis for real-time indoor scene reconstruction. ACM TOG 34, 5 (2015), 159.Google ScholarDigital Library
    60. Q-Y. Zhou and V. Koltun. 2013. Dense scene reconstruction with points of interest. ACM TOG 32, 4 (2013), 112.Google ScholarDigital Library
    61. Q.-Y. Zhou and V. Koltun. 2014. Color map optimization for 3D reconstruction with consumer depth cameras. ACM TOG 33, 4 (2014), 155.Google ScholarDigital Library
    62. Q.-Y. Zhou, S. Miller, and V. Koltun. 2013. Elastic fragments for dense scene reconstruction. In 2013 IEEE International Conference on Computer Vision (ICCV’13). IEEE, 473–480. Google ScholarDigital Library
    63. M. Zollhöfer, A. Dai, M. Innmann, C. Wu, M. Stamminger, C. Theobalt, and M. Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM TOG 34, 4 (2015), 96.Google ScholarDigital Library
    64. M. Zollhöfer, M. Nießner, S. Izadi, C. Rehmann, C. Zach, M. Fisher, C. Wu, A. Fitzgibbon, C. Loop, C. Theobalt, and M. Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4 (2014), 156.Google ScholarDigital Library


ACM Digital Library Publication: