“ROSEFusion: random optimization for online dense reconstruction under fast camera motion”
Conference:
Type(s):
Title:
- ROSEFusion: random optimization for online dense reconstruction under fast camera motion
Session/Category Title: Summary and Q&A: Volumetric Modeling and Reconstruction
Presenter(s)/Author(s):
Abstract:
Online reconstruction based on RGB-D sequences has thus far been restrained to relatively slow camera motions (<1m/s). Under very fast camera motion (e.g., 3m/s), the reconstruction can easily crumble even for the state-of-the-art methods. Fast motion brings two challenges to depth fusion: 1) the high nonlinearity of camera pose optimization due to large inter-frame rotations and 2) the lack of reliably trackable features due to motion blur. We propose to tackle the difficulties of fast-motion camera tracking in the absence of inertial measurements using random optimization, in particular, the Particle Filter Optimization (PFO). To surmount the computation-intensive particle sampling and update in standard PFO, we propose to accelerate the randomized search via updating a particle swarm template (PST). PST is a set of particles pre-sampled uniformly within the unit sphere in the 6D space of camera pose. Through moving and rescaling the pre-sampled PST guided by swarm intelligence, our method is able to drive tens of thousands of particles to locate and cover a good local optimum extremely fast and robustly. The particles, representing candidate poses, are evaluated with a fitness function defined based on depth-model conformance. Therefore, our method, being depth-only and correspondence-free, mitigates the motion blur impediment as (ToF-based) depths are often resilient to motion blur. Thanks to the efficient template-based particle set evolution and the effective fitness function, our method attains good quality pose tracking under fast camera motion (up to 4m/s) in a realtime framerate without including loop closure or global pose optimization. Through extensive evaluations on public datasets of RGB-D sequences, especially on a newly proposed benchmark of fast camera motion, we demonstrate the significant advantage of our method over the state of the arts.
References:
1. Christophe Andrieu and Arnaud Doucet. 2002. Particle filtering for partially observed Gaussian state space models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 4 (2002), 827–836.Google ScholarCross Ref
2. Elise Arnaud and Etienne Mémin. 2005. An efficient Rao-Blackwellized particle filter for object tracking. In IEEE International Conference on Image Processing 2005, Vol. 2. IEEE, II-426.Google ScholarCross Ref
3. Robert Bridson. 2007. Fast Poisson disk sampling in arbitrary dimensions. SIGGRAPH sketches 10 (2007), 1.Google Scholar
4. Erik Bylow, Jürgen Sturm, Christian Kerl, Fredrik Kahl, and Daniel Cremers. 2013. Real-time camera tracking and 3D reconstruction using signed distance functions. In Robotics: Science and Systems, Vol. 2. 2.Google Scholar
5. Jiawen Chen, Dennis Bautembach, and Shahram Izadi. 2013. Scalable real-time volumetric surface reconstruction. ACM Transactions on Graphics (ToG) 32, 4 (2013), 1–16.Google ScholarDigital Library
6. Changhyun Choi and Henrik I Christensen. 2012. Robust 3D visual tracking using particle filtering on the special Euclidean group: A combined approach of keypoint and edge features. The International Journal of Robotics Research 31, 4 (2012), 498–519.Google ScholarDigital Library
7. Sungjoon Choi, Qian-Yi Zhou, and Vladlen Koltun. 2015. Robust reconstruction of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5556–5565.Google Scholar
8. Zhaopeng Cui, Lionel Heng, Ye Chuan Yeo, Andreas Geiger, Marc Pollefeys, and Torsten Sattler. 2019. Real-time dense mapping for self-driving vehicles using fisheye cameras. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 6087–6093.Google ScholarDigital Library
9. Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In Proc. of SIGGRAPH. 303–312.Google ScholarDigital Library
10. Angela Dai, Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Christian Theobalt. 2017. BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Reintegration. ACM Transactions on Graphics (TOG) 36, 3 (2017), 24.Google ScholarDigital Library
11. Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Timothy Bretl, and Dieter Fox. 2019. Poserbpf: A rao-blackwellized particle filter for 6d object pose tracking. arXiv preprint arXiv:1905.09304 (2019).Google Scholar
12. Siyan Dong, Kai Xu, Qiang Zhou, Andrea Tagliasacchi, Shiqing Xin, Matthias Nießner, and Baoquan Chen. 2019. Multi-robot collaborative dense scene reconstruction. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–16.Google ScholarDigital Library
13. Felix Endres, Jürgen Hess, Nikolas Engelhard, Jürgen Sturm, Daniel Cremers, and Wolfram Burgard. 2012. An evaluation of the RGB-D SLAM system. In 2012 IEEE International Conference on Robotics and Automation. IEEE, 1691–1696.Google ScholarCross Ref
14. Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision. Springer, 834–849.Google ScholarCross Ref
15. Jakob Engel, Jörg Stückler, and Daniel Cremers. 2015. Large-scale direct SLAM with stereo cameras. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 1935–1942.Google ScholarDigital Library
16. Jakob Engel, Jürgen Sturm, and Daniel Cremers. 2013. Semi-dense visual odometry for a monocular camera. In Proceedings of the IEEE international conference on computer vision. 1449–1456.Google ScholarDigital Library
17. Anders Eriksson, John Bastian, Tat-Jun Chin, and Mats Isaksson. 2016. A consensus-based framework for distributed bundle adjustment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1754–1762.Google ScholarCross Ref
18. Christian Forster, Luca Carlone, Frank Dellaert, and Davide Scaramuzza. 2016. On-Manifold Preintegration for Real-Time Visual-Inertial Odometry. IEEE Transactions on Robotics 33, 1 (2016), 1–21.Google ScholarDigital Library
19. Christian Forster, Matia Pizzoli, and Davide Scaramuzza. 2014. SVO: Fast semi-direct monocular visual odometry. In 2014 IEEE international conference on robotics and automation (ICRA). IEEE, 15–22.Google ScholarCross Ref
20. Guillermo Gallego, Tobi Delbruck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew Davison, Jörg Conradt, Kostas Daniilidis, et al. 2019. Event-based vision: A survey. arXiv preprint arXiv:1904.08405 (2019).Google Scholar
21. Arturo Gil, Óscar Reinoso, Mónica Ballesta, and Miguel Juliá. 2010. Multi-robot visual SLAM using a Rao-Blackwellized particle filter. Robotics and Autonomous Systems 58, 1 (2010), 68–80.Google ScholarDigital Library
22. Neil J Gordon, David J Salmond, and Adrian FM Smith. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. In IEE proceedings F (radar and signal processing), Vol. 140. IET, 107–113.Google ScholarCross Ref
23. Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. 2007. Improved techniques for grid mapping with rao-blackwellized particle filters. IEEE transactions on Robotics 23, 1 (2007), 34–46.Google ScholarDigital Library
24. Giorgio Grisettiyz, Cyrill Stachniss, and Wolfram Burgard. 2005. Improving grid-based slam with rao-blackwellized particle filters by adaptive proposals and selective resampling. In Proceedings of the 2005 IEEE international conference on robotics and automation. IEEE, 2432–2437.Google ScholarCross Ref
25. Ankur Handa, Thomas Whelan, John McDonald, and Andrew J Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In 2014 IEEE international conference on Robotics and automation (ICRA). IEEE, 1524–1531.Google ScholarCross Ref
26. Miles Hansard, Seungkyu Lee, Ouk Choi, and Radu Patrice Horaud. 2012. Time-of-flight cameras: principles, methods and applications. Springer Science & Business Media.Google Scholar
27. Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren, and Dieter Fox. 2014. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Experimental robotics. Springer, 477–491.Google Scholar
28. Jiahui Huang, Shi-Sheng Huang, Haoxuan Song, and Shi-Min Hu. 2020. DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors. arXiv preprint arXiv:2012.05551 (2020).Google Scholar
29. Jin Huang, Xiaohan Shi, Xinguo Liu, Kun Zhou, Li-Yi Wei, Shang-Hua Teng, Hujun Bao, Baining Guo, and Heung-Yeung Shum. 2006. Subspace gradient domain mesh deformation. In ACM SIGGRAPH 2006 Papers. 1126–1134.Google ScholarDigital Library
30. Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera. In UIST. 559–568.Google Scholar
31. Chunlin Ji, Yangyang Zhang, Mengmeng Tong, and Shengxiang Yang. 2008. Particle filter with swarm move for optimization. In International Conference on Parallel Problem Solving from Nature. Springer, 909–918.Google ScholarCross Ref
32. O. Kahler, V. A. Prisacariu, C. Y. Ren, X. Sun, P. H. S Torr, and D. W. Murray. 2015. Very High Frame Rate Volumetric Integration of Depth Images on Mobile Device. IEEE Trans. Vis. & Computer Graphics (ISMAR) 22, 11 (2015).Google Scholar
33. Maik Keller, Damien Lefloch, Martin Lambers, Shahram Izadi, Tim Weyrich, and Andreas Kolb. 2013. Real-time 3d reconstruction in dynamic scenes using point-based fusion. In 2013 International Conference on 3D Vision-3DV 2013. IEEE, 1–8.Google ScholarDigital Library
34. Christian Kerl, Jürgen Sturm, and Daniel Cremers. 2013. Dense visual SLAM for RGB-D cameras. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2100-2106.Google ScholarCross Ref
35. Georg Klein and David Murray. 2007. Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality. IEEE, 225–234.Google ScholarDigital Library
36. Rainer Kuemmerle, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. 2011. g2o: A General Framework for Graph Optimization. In Proc. ICRA.Google Scholar
37. Tristan Laidlow, Michael Bloesch, Wenbin Li, and Stefan Leutenegger. 2017. Dense rgb-d-inertial slam with map deformations. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 6741–6748.Google ScholarDigital Library
38. Seungkyu Lee. 2014. Time-of-flight depth camera motion blur detection and deblurring. IEEE Signal Processing Letters 21, 6 (2014), 663–666.Google ScholarCross Ref
39. Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart, and Paul Furgale. 2015. Keyframe-based visual-inertial odometry using nonlinear optimization. The International Journal of Robotics Research 34, 3 (2015), 314–334.Google ScholarDigital Library
40. Ping Li, Trevor J Hastie, and Kenneth W Church. 2006. Very sparse random projections. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. 287–296.Google ScholarDigital Library
41. Bin Liu, Shi Cheng, and Yuhui Shi. 2016. Particle filter optimization: A brief introduction. In International Conference on Swarm Intelligence. Springer, 95–104.Google ScholarCross Ref
42. Ligang Liu, Xi Xia, Han Sun, Qi Shen, Juzhan Xu, Bin Chen, Hui Huang, and Kai Xu. 2018. Object-Aware Guidance for Autonomous Scene Reconstruction. ACM Trans. on Graph. (SIGGRAPH) 37, 4 (2018).Google ScholarDigital Library
43. Anastasios I Mourikis and Stergios I Roumeliotis. 2007. A multi-state constraint Kalman filter for vision-aided inertial navigation. In Proceedings 2007 IEEE International Conference on Robotics and Automation. IEEE, 3565–3572.Google ScholarCross Ref
44. Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D Tardos. 2015. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE transactions on robotics 31, 5 (2015), 1147–1163.Google Scholar
45. Raul Mur-Artal and Juan D Tardós. 2017. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics 33, 5 (2017), 1255–1262.Google ScholarDigital Library
46. Karthikeyan Natesan Ramamurthy, Chung-Ching Lin, Aleksandr Aravkin, Sharath Pankanti, and Raphael Viguier. 2017. Distributed bundle adjustment. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 2146–2154.Google ScholarCross Ref
47. A. Neumaier. 1974. Rounding Error Analysis of Some Methods for Summing Finite Sums. Zeitschrift für Angewandte Mathematik und Mechanik 54, 1 (1974), 39–51.Google ScholarCross Ref
48. Richard A Newcombe, Andrew J Davison, Shahram Izadi, Pushmeet Kohli, Otmar Hilliges, Jamie Shotton, David Molyneaux, Steve Hodges, David Kim, and Andrew Fitzgibbon. 2011a. KinectFusion: Real-time dense surface mapping and tracking. In Proc. IEEE Int. Symp. on Mixed and Augmented Reality. 127–136.Google ScholarDigital Library
49. Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. 2011b. DTAM: Dense tracking and mapping in real-time. In 2011 international conference on computer vision. IEEE, 2320–2327.Google ScholarDigital Library
50. Yinyu Nie, Ji Hou, Xiaoguang Han, and Matthias Nießner. 2020. RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction. arXiv preprint arXiv:2011.14744 (2020).Google Scholar
51. Matthias Nießner, Angela Dai, and Matthew Fisher. 2014. Combining Inertial Navigation and ICP for Real-time 3D Surface Reconstruction.. In Eurographics (Short Papers). Citeseer, 13–16.Google Scholar
52. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D Reconstruction at Scale using Voxel Hashing. ACM Trans. on Graph. (SIGGRAPH Asia) 32, 6 (2013), 169.Google ScholarDigital Library
53. Marcos Nieto, Andoni Cortés, Oihana Otaegui, Jon Arróspide, and Luis Salgado. 2016. Real-time lane tracking using Rao-Blackwellized particle filter. Journal of Real-Time Image Processing 11, 1 (2016), 179–191.Google ScholarDigital Library
54. Marc Pollefeys, David Nistér, J-M Frahm, Amir Akbarzadeh, Philippos Mordohai, Brian Clipp, Chris Engels, David Gallup, S-J Kim, Paul Merrell, et al. 2008. Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision 78, 2 (2008), 143–167.Google ScholarDigital Library
55. Vivek Pradeep, Christoph Rhemann, Shahram Izadi, Christopher Zach, Michael Bleyer, and Steven Bathiche. 2013. MonoFusion: Real-time 3D reconstruction of small scenes with a single web camera. In 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 83–88.Google ScholarCross Ref
56. Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).Google Scholar
57. Olivier Saurer, Marc Pollefeys, and Gim Hee Lee. 2016. Sparse to dense 3d reconstruction from rolling shutter images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3337–3345.Google ScholarCross Ref
58. Davide Scaramuzza and Zichao Zhang. 2019. Visual-inertial odometry of aerial robots. arXiv preprint arXiv:1906.03289 (2019).Google Scholar
59. Jochen Schmidt and Heinrich Niemann. 2001. Using Quaternions for Parametrizing 3-D Rotations in Unconstrained Nonlinear Optimization.. In Vmv, Vol. 1. Citeseer, 399–406.Google Scholar
60. Thomas Schops, Torsten Sattler, and Marc Pollefeys. 2019. BAD SLAM: Bundle adjusted direct rgb-d slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 134–144.Google ScholarCross Ref
61. Yuhui Shi and Russell C Eberhart. 1999. Empirical study of particle swarm optimization. In Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), Vol. 3. IEEE, 1945–1950.Google ScholarCross Ref
62. Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, et al. 2019. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019).Google Scholar
63. Jörg Stückler and Sven Behnke. 2014. Multi-resolution surfel maps for efficient dense 3D modeling and tracking. Journal of Visual Communication and Image Representation 25, 1 (2014), 137–147.Google ScholarDigital Library
64. Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 573–580.Google ScholarCross Ref
65. Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290, 5500 (2000), 2319–2323.Google Scholar
66. Sebastian Thrun. 2002. Probabilistic robotics. Commun. ACM 45, 3 (2002), 52–57.Google ScholarDigital Library
67. Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment: a modern synthesis. In International workshop on vision algorithms. Springer, 298–372.Google Scholar
68. Thomas Whelan, Michael Kaess, Maurice Fallon, Hordur Johannsson, John Leonard, and John McDonald. 2012. Kintinuous: Spatially Extended KinectFusion. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras.Google Scholar
69. Thomas Whelan, Stefan Leutenegger, Renato F Salas-Moreno, Ben Glocker, and Andrew J Davison. 2015. ElasticFusion: Dense SLAM without a pose graph. In Proc. Robotics: Science and Systems.Google ScholarCross Ref
70. Kai Xu, Hui Huang, Yifei Shi, Hao Li, Pinxin Long, Jiannong Caichen, Wei Sun, and Baoquan Chen. 2015. Autoscanning for Coupled Scene Reconstruction and Proactive Object Analysis. ACM Trans. on Graph. 34, 6 (2015), 177.Google ScholarDigital Library
71. Kai Xu, Lintao Zheng, Zihao Yan, Guohang Yan, Eugene Zhang, Matthias Nießner, Oliver Deussen, Daniel Cohen-Or, and Hui Huang. 2017. Autonomous Reconstruction of Unknown Indoor Scenes Guided by Time-varying Tensor Fields. ACM Transactions on Graphics 2017 (TOG) (2017).Google Scholar
72. Chi Zhang, Amirhossein Taghvaei, and Prashant G Mehta. 2017. A controlled particle filter for global optimization. arXiv preprint arXiv:1701.02413 (2017).Google Scholar
73. Jiazhao Zhang, Chenyang Zhu, Lintao Zheng, and Kai Xu. 2020. Fusion-aware point convolution for online semantic 3d scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4534–4543.Google ScholarCross Ref