BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration

Angela Dai; Matthias Nießner; Michael Zollhöfer; Shahram Izadi; Christian Theobalt

“BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration”

Next: “Bunny Holes in “Rise of the... »

« Previous: “Bundled camera paths for video...

Conference:

SIGGRAPH 2017

Type(s):

Technical Papers

Title:

BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration

Session/Category Title: Reconstructing 3D Surfaces From Points, Lines, Images & Water

Presenter(s)/Author(s):

Angela Dai

Matthias Nießner

Michael Zollhöfer

Shahram Izadi

Christian Theobalt

Moderator(s):

Alla Sheffer

Abstract:

Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes to perform online correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation, resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real time to ensure global consistency, all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.1

References:

1. S. Agarwal, K. Mierle, and Others. 2013. Ceres Solver. Retrieved from http://ceres-solver.org. (2013).Google Scholar
2. P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Trans. PAMI 14, 2 (1992), 239–256. Google ScholarDigital Library
3. J. Chen, D. Bautembach, and S. Izadi. 2013. Scalable real-time volumetric surface reconstruction. ACM TOG 32, 4 (2013), 113.Google ScholarDigital Library
4. S. Choi, Q.-Y. Zhou, and V. Koltun. 2015. Robust reconstruction of indoor scenes. In Proc. CVPR. 5556–5565.Google Scholar
5. B. Curless and M. Levoy. 1996. A volumetric method for building complex models from range images. In Proc. SIGGRAPH. ACM, 303–312. Google ScholarDigital Library
6. Z. DeVito, M. Mara, M. Zollhöfer, G. Bernstein, J. Ragan-Kelley, C. Theobalt, P. Hanrahan, M. Fisher, and M. Nießner. 2016. Opt: A domain specific language for non-linear least squares optimization in graphics and imaging. arXiv Preprint arXiv:1604.06525 (2016).Google Scholar
7. A. Elfes and L. Matthies. 1987. Sensor integration for robot navigation: Combining sonar and stereo range data in a grid-based representataion. In 26th IEEE Conference on Decision and Control, 1987, Vol. 26. IEEE, 1802–1807. Google ScholarCross Ref
8. F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. 2012. An evaluation of the RGB-D SLAM system. In Proc. ICRA. IEEE, 1691–1696. Google ScholarCross Ref
9. J. Engel, T. Schöps, and D. Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision. Google ScholarCross Ref
10. J. Engel, J. Sturm, and D. Cremers. 2013. Semi-dense visual odometry for a monocular camera. In Proc. ICCV. IEEE, 1449–1456. Google ScholarDigital Library
11. N. Fioraio, J. Taylor, A. Fitzgibbon, L. Di Stefano, and S. Izadi. 2015. Large-scale and drift-free surface reconstruction using online subvolume registration. Proc. CVPR (June 2015). Google ScholarCross Ref
12. C. Forster, M. Pizzoli, and D. Scaramuzza. 2014. SVO: Fast semi-direct monocular visual odometry. In Proc. ICRA. IEEE, 15–22. Google ScholarCross Ref
13. S. Fuhrmann and M. Goesele. 2014. Floating scale surface reconstruction. In Proc. SIGGRAPH. Google ScholarDigital Library
14. D. Gallup, M. Pollefeys, and J.-M. Frahm. 2010. 3D reconstruction using an n-layer heightmap. In Pattern Recognition. Springer, 1–10. Google ScholarCross Ref
15. B. Glocker, J. Shotton, A. Criminisi, and S. Izadi. 2015. Real-time RGB-D camera relocalization via randomized ferns for keyframe encoding. TVCG 21, 5 (2015), 571–583. Google ScholarCross Ref
16. J. C. Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33–51. Google ScholarCross Ref
17. A. Handa, T. Whelan, J. B. McDonald, and A. J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In Proc. ICRA. Google ScholarCross Ref
18. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. 2010. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Proc. Int. Symp. Experimental Robotics, Vol. 20. 22–25.Google Scholar
19. A. Hilton, A. Stoddart, J. Illingworth, and T. Windeatt. 1996. Reliable surface reconstruction from multiple range images. J. Proc. ECCV 1 (1996), 117–126. Google ScholarCross Ref
20. S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. UIST. 559–568. Google ScholarDigital Library
21. W. Kabsch. 1976. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A: Crystal Phys. Diffract. Theoret. General Crystallogr. 32, 5 (1976), 922–923. Google ScholarCross Ref
22. M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb. 2013. Real-time 3D reconstruction in dynamic scenes using point-based fusion. In Proc. 3DV. IEEE, 1–8. Google ScholarDigital Library
23. C. Kerl, J. Sturm, and D. Cremers. 2013. Dense visual SLAM for RGB-D cameras. In Proc. IROS. Google ScholarCross Ref
24. G. Klein and D. Murray. 2007. Parallel tracking and mapping for small AR workspaces. In Proc. ISMAR. Google ScholarDigital Library
25. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. 2011. g 2 o: A general framework for graph optimization. In Proc. ICRA. IEEE, 3607–3613.Google Scholar
26. M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginzton, S. Anderson, J. Davis, J. Ginsberg, and D. Fulk. 2000. The digital michelangelo project: 3D scanning of large statues. In Proc. SIGGRAPH. ACM Press/Addison-Wesley Publishing Co., 131–144. Google ScholarDigital Library
27. H. Li, E. Vouga, A. Gudym, L. Luo, J. T. Barron, and G. Gusev. 2013. 3D self-portraits. ACM TOG 32, 6 (2013), 187.Google ScholarDigital Library
28. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60 (2004), 91–110. Google ScholarDigital Library
29. R. Maier, J. Sturm, and D. Cremers. 2014. Submap-based bundle adjustment for 3D reconstruction from RGB-D data. In Proc. GCPR. Google ScholarCross Ref
30. M. Meilland and A. Comport. 2013. On unifying key-frame and voxel-based dense visual SLAM at large scales. In Proc. IROS. IEEE, 3677–3683. Google ScholarCross Ref
31. M. Meilland, A. Comport, P. Rives, and I. S. Antipolis Méditerranée. 2011. Real-time dense visual tracking under large lighting variations. In Proc. BMVC, Vol. 29. Google ScholarCross Ref
32. P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J. M. Frahm, R. Yang, D. Nistér, and M. Pollefeys. 2007. Real-time visibility-based fusion of depth maps. In Proc. ICCV. 1–8. Google ScholarCross Ref
33. R. M. Murray, S. S. Sastry, and L. Zexiang. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press.Google Scholar
34. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. 2011a. KinectFusion: Real-time dense surface mapping and tracking. In Proc. ISMAR. 127–136. Google ScholarDigital Library
35. R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. 2011b. DTAM: Dense tracking and mapping in real-time. In Proc. ICCV. 2320–2327. Google ScholarDigital Library
36. M. Nießner, A. Dai, and M. Fisher. 2014. Combining inertial navigation and ICP for real-time 3d surface reconstruction. In Eurographics (Short Papers). 13–16.Google Scholar
37. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM TOG 32, 6 (2013), 169.Google ScholarDigital Library
38. V. Pradeep, C. Rhemann, S. Izadi, C. Zach, M. Bleyer, and S. Bathiche. 2013. MonoFusion: Real-time 3d reconstruction of small scenes with a single web camera. In Proc. ISMAR. 83–88. Google ScholarCross Ref
39. F. Reichl, J. Weiss, and R. Westermann. 2015. Memory-efficient interactive online reconstruction from depth image streams. In Computer Graphics Forum. Wiley Online Library.Google Scholar
40. H. Roth and M. Vona. 2012. Moving volume kinectfusion. In Proc. BMVC. Google ScholarCross Ref
41. S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. 2002. Real-time 3D model acquisition. ACM TOG 21, 3 (2002), 438–446. Google ScholarDigital Library
42. S. Rusinkiewicz and M. Levoy. 2001. Efficient variants of the ICP algorithm. In Proc. 3DIM. 145–152. Google ScholarCross Ref
43. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. Google ScholarDigital Library
44. F. Steinbruecker, C. Kerl, J. Sturm, and D. Cremers. 2013. Large-scale multi-resolution surface reconstruction from RGB-D sequences. In Proc. ICCV. Google ScholarDigital Library
45. F. Steinbruecker, J. Sturm, and D. Cremers. 2014. Volumetric 3D mapping in real-time on a CPU. In 2014 IEEE International Conference on Robotics and Automation (ICRA’14). Google ScholarCross Ref
46. J. Stückler and S. Behnke. 2014. Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Visual Communication Image Representation 25, 1 (2014), 137–147. Google ScholarDigital Library
47. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proc. IROS. Google ScholarCross Ref
48. B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon. 2000. Bundle adjustment, a modern synthesis. In Vision Algorithms: Theory and Practice. Springer, 298–372. Google ScholarCross Ref
49. J. Valentin, M. Nießner, J. Shotton, A. Fitzgibbon, S. Izadi, and P. Torr. 2015. Exploiting uncertainty in regression forests for accurate camera relocalization. In Proc. CVPR. 4400–4408. Google ScholarCross Ref
50. T. Weise, T. Wismer, B. Leibe, and L. Van Gool. 2009. In-hand scanning with online loop closure. In Proc. ICCV Workshops. 1630–1637. Google ScholarCross Ref
51. T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. 2012. Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous. Technical Report. Query date: 10-25-2012.Google Scholar
52. T. Whelan, H. Johannsson, M. Kaess, J.J. Leonard, and J. McDonald. 2013a. Robust real-time visual odometry for dense RGB-D mapping. In Proc. ICRA. Google ScholarCross Ref
53. T. Whelan, M. Kaess, J. J. Leonard, and J. McDonald. 2013b. Deformation-based loop closure for large scale dense RGB-D SLAM. In Proc. IROS. IEEE, 548–555. Google ScholarCross Ref
54. T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A. J. Davison. 2015. ElasticFusion: Dense SLAM without a pose graph. In Proc. RSS. Rome, Italy. Google ScholarCross Ref
55. C. Wu, M. Zollhöfer, M. Nießner, M. Stamminger, S. Izadi, and C. Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6 (2014), 200.Google ScholarDigital Library
56. K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard. 2010. OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proc. ICRA, Vol. 2.Google Scholar
57. J. Xiao, A. Owens, and A. Torralba. 2013. SUN3D: A database of big spaces reconstructed using sfm and object labels. In Proc. ICCV. IEEE, 1625–1632. Google ScholarDigital Library
58. M. Zeng, F. Zhao, J. Zheng, and X. Liu. 2012. Octree-based fusion for realtime 3d reconstruction. Graph. Models 75, 3 (2012), 126–136. Google ScholarDigital Library
59. Y. Zhang, W. Xu, Y. Tong, and K. Zhou. 2015. Online structure analysis for real-time indoor scene reconstruction. ACM TOG 34, 5 (2015), 159.Google ScholarDigital Library
60. Q-Y. Zhou and V. Koltun. 2013. Dense scene reconstruction with points of interest. ACM TOG 32, 4 (2013), 112.Google ScholarDigital Library
61. Q.-Y. Zhou and V. Koltun. 2014. Color map optimization for 3D reconstruction with consumer depth cameras. ACM TOG 33, 4 (2014), 155.Google ScholarDigital Library
62. Q.-Y. Zhou, S. Miller, and V. Koltun. 2013. Elastic fragments for dense scene reconstruction. In 2013 IEEE International Conference on Computer Vision (ICCV’13). IEEE, 473–480. Google ScholarDigital Library
63. M. Zollhöfer, A. Dai, M. Innmann, C. Wu, M. Stamminger, C. Theobalt, and M. Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM TOG 34, 4 (2015), 96.Google ScholarDigital Library
64. M. Zollhöfer, M. Nießner, S. Izadi, C. Rehmann, C. Zach, M. Fisher, C. Wu, A. Fitzgibbon, C. Loop, C. Theobalt, and M. Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4 (2014), 156.Google ScholarDigital Library

ACM Digital Library Publication: