“Tanks and temples: benchmarking large-scale scene reconstruction” by Knapitsch, Park, Zhou and Koltun

  • ©Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun



Session Title:

    Reconstructing 3D Surfaces From Points, Lines, Images & Water


    Tanks and temples: benchmarking large-scale scene reconstruction




    We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.


    1. Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. 2016. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision 120, 2 (2016). Google ScholarDigital Library
    2. Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2011. Building Rome in a day. Communications of the ACM 54, 10 (2011). Google ScholarDigital Library
    3. Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2010. Bundle adjustment in the large. In ECCV. Google ScholarCross Ref
    4. Matthew Berger, Joshua A. Levine, Luis Gustavo Nonato, Gabriel Taubin, and Cláudio T. Silva. 2013. A benchmark for surface reconstruction. ACM Transactions on Graphics 32, 2 (2013). Google ScholarDigital Library
    5. Michael Burri, Janosch Nikolic, Pascal Gohl, Thomas Schneider, Joern Rehder, Sammy Omari, Markus W. Achtelik, and Roland Siegwart. 2016. The EuRoC micro aerial vehicle datasets. International Journal of Robotics Research 35, 10 (2016). Google ScholarDigital Library
    6. Sungjoon Choi, Qian-Yi Zhou, and Vladlen Koltun. 2015. Robust reconstruction of indoor scenes. In CVPR.Google Scholar
    7. Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2017. Direct sparse odometry. Pattern Analysis and Machine Intelligence 39 (2017).Google Scholar
    8. Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In ECCV.Google Scholar
    9. Jan-Michael Frahm, Marc Pollefeys, Svetlana Lazebnik, David Gallup, Brian Clipp, Rahul Raguram, Changchang Wu, Christopher Zach, and Tim Johnson. 2010. Fast robust large-scale mapping from video and Internet photo collections. 65, 6 (2010).Google Scholar
    10. Simon Fuhrmann, Fabian Langguth, Nils Moehrle, Michael Waechter, and Michael Goesele. 2015. MVE – An image-based reconstruction environment. Computers & Graphics 53 (2015). Google ScholarDigital Library
    11. Yasutaka Furukawa. 2011. CMVS and PMVS2. http://www.di.ens.fr/cmvs. (2011).Google Scholar
    12. Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2009. Reconstructing building interiors from images. In ICCV. Google ScholarCross Ref
    13. Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2010. Towards Internet-scale multi-view stereo. In CVPR. Google ScholarCross Ref
    14. Yasutaka Furukawa and Carlos Hernández. 2015. Multi-view stereo: A tutorial. Foundations and Trends in Computer Graphics and Vision 9, 1–2 (2015). Google ScholarDigital Library
    15. Yasutaka Furukawa and Jean Ponce. 2010. Accurate, dense, and robust multiview stereopsis. Pattern Analysis and Machine Intelligence 32, 8 (2010). Google ScholarDigital Library
    16. Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32, 11 (2013). Google ScholarDigital Library
    17. Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In ICCV. Google ScholarCross Ref
    18. Ankur Handa, Thomas Whelan, John McDonald, and Andrew J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In ICRA.Google Scholar
    19. Richard Hartley and Andrew Zisserman. 2000. Multiple view geometry in computer vision. Cambridge University Press.Google Scholar
    20. Jared Heinly, Johannes L. Schönberger, Enrique Dunn, and Jan-Michael Frahm. 2015. Reconstructing the world* in six days. In CVPR.Google Scholar
    21. Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In ICCV. Google ScholarDigital Library
    22. Wenzel Jakob. 2010. Mitsuba renderer. http://www.mitsuba-renderer.org. (2010).Google Scholar
    23. Michal Jancosek and Tomas Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In CVPR. Google ScholarDigital Library
    24. Kalin Kolev, Petri Tanskanen, Pablo Speciale, and Marc Pollefeys. 2014. Turning mobile phones into 3D scanners. In CVPR.Google Scholar
    25. Fabian Langguth, Kalyan Sunkavalli, Sunil Hadap, and Michael Goesele. 2016. Shading-aware multi-view stereo. In ECCV. Google ScholarCross Ref
    26. Xiaowei Li, Changchang Wu, Christopher Zach, Svetlana Lazebnik, and Jan-Michael Frahm. 2008. Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV. Google ScholarDigital Library
    27. Andrew Mastin, Jeremy Kepner, and John Fisher. 2009. Automatic registration of LIDAR and optical images of urban scenes. In CVPR. Google ScholarCross Ref
    28. Paul Merrell, Philippos Mordohai, Jan-Michael Frahm, and Marc Pollefeys. 2007. Evaluation of large scale scene reconstruction. In ICCV Workshops. Google ScholarCross Ref
    29. Pierre Moulon, Pascal Monasse, Renaud Marlet, and others. 2016. OpenMVG: An open multiple view geometry library. https://github.com/openMVG/openMVG. (2016).Google Scholar
    30. Richard A. Newcombe, Steven Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense tracking and mapping in real-time. In ICCV.Google Scholar
    31. Marc Pollefeys, David Nistér, Jan-Michael Frahm, Amir Akbarzadeh, Philippos Mordohai, Brian Clipp, Chris Engels, David Gallup, Seon Joo Kim, Paul Merrell, C. Salmi, Sudipta N. Sinha, B. Talton, Liang Wang, Qingxiong Yang, Henrik Stewénius, Ruigang Yang, Greg Welch, and Herman Towles. 2008. Detailed real-time urban 3D reconstruction from video. International Journal of Computer Vision 78, 2–3 (2008).Google ScholarDigital Library
    32. Johannes L. Schönberger. 2016. COLMAP. https://colmap.github.io. (2016).Google Scholar
    33. Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In CVPR. Google ScholarCross Ref
    34. Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise view selection for unstructured multi-view stereo. In ECCV. Google ScholarCross Ref
    35. Thomas Schöps, Torsten Sattler, Christian Häne, and Marc Pollefeys. 2015. 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In 3DV.Google Scholar
    36. Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In CVPR.Google Scholar
    37. Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR. Google ScholarDigital Library
    38. Qi Shan, Riley Adams, Brian Curless, Yasutaka Furukawa, and Steven M. Seitz. 2013. The visual Turing test for scene reconstruction. In 3DV.Google Scholar
    39. Noah Snavely. 2010. Bundler: Structure from motion (SfM) for unordered image collections. https://github.com/snavely/bundler_sfm. (2010).Google Scholar
    40. Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2008. Modeling the world from Internet photo collections. International Journal of Computer Vision 80, 2 (2008). Google ScholarDigital Library
    41. Christoph Strecha, Wolfgang von Hansen, Luc J. Van Gool, Pascal Fua, and Ulrich Thoennessen. 2008. On benchmarking camera calibration and multi-view stereo for high resolution imagery. In CVPR. Google ScholarCross Ref
    42. Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In IROS. Google ScholarCross Ref
    43. Chris Sweeney. 2016. Theia multiview geometry library. http://theia-sfm.org. (2016).Google Scholar
    44. Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, and Marc Pollefeys. 2013. Live metric 3D reconstruction on mobile phones. In ICCV.Google Scholar
    45. Engin Tola, Christoph Strecha, and Pascal Fua. 2012. Efficient large-scale multi-view stereo for ultra high-resolution image sets. Machine Vision and Applications 23, 5 (2012). Google ScholarDigital Library
    46. Bill Triggs, Philip Mclauchlan, Richard Hartley, and Andrew Fitzgibbon. 2000. Bundle adjustment – a modern synthesis. In Vision Algorithms: Theory and Practice. Google ScholarCross Ref
    47. Shinji Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. Pattern Analysis and Machine Intelligence 13, 4 (1991). Google ScholarDigital Library
    48. George Vogiatzis and Carlos Hernández. 2011. Video-based, real-time multi-view stereo. Image and Vision Computing 29, 7 (2011). Google ScholarDigital Library
    49. Hoang-Hiep Vu, Patrick Labatut, Jean-Philippe Pons, and Renaud Keriven. 2012. High accuracy and visibility-consistent dense multiview stereo. Pattern Analysis and Machine Intelligence 34, 5 (2012). Google ScholarDigital Library
    50. Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view prediction error for 3D reconstruction. ACM Transactions on Graphics 36, 1 (2017). Google ScholarDigital Library
    51. Andreas Wendel, Michael Maurer, Gottfried Graber, Thomas Pock, and Horst Bischof. 2012. Dense reconstruction on-the-fly. In CVPR. Google ScholarCross Ref
    52. Changchang Wu. 2011. VisualSFM: A visual structure from motion system. http://ccwu.me/vsfm. (2011).Google Scholar
    53. Changchang Wu. 2013. Towards linear-time incremental structure from motion. In 3DV.Google Scholar
    54. Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz. 2011. Multicore bundle adjustment. In CVPR. Google ScholarDigital Library
    55. Jianxiong Xiao and Yasutaka Furukawa. 2014. Reconstructing the world’s museums. International Journal of Computer Vision 110, 3 (2014). Google ScholarDigital Library
    56. Qian-Yi Zhou and Vladlen Koltun. 2013. Dense scene reconstruction with points of interest. ACM Transactions on Graphics 32, 4 (2013). Google ScholarDigital Library

ACM Digital Library Publication: