X-SLAM: Scalable Dense SLAM for Task-aware Optimization Using CSFD

X-SLAM is a real-time dense differentiable SLAM system. It exploits CSFD for an accurate and robust (high-order) numerical differentiation, which avoids the need for the computational graph and substantially reduces the memory footprint. X-SLAM is compatible with most SLAM algorithms. Evaluations via real-world applications show clear improvements over existing solutions.

References:

[1]
Andres M. Aguirre-Mesa, Manuel J. Garcia, and Harry Millwater. 2020. MultiZ: A Library for Computation of High-order Derivatives Using Multicomplex or Multidual Numbers. ACM Trans. Math. Softw. 46, 3, Article 23 (jul 2020), 30 pages.

[2]
Andreas Bircher, Mina Kamel, Kostas Alexis, Helen Oleynikova, and Roland Siegwart.

[3]
2016. Receding Horizon “Next-Best-View” Planner for 3D Exploration. In 2016 IEEE International Conference on Robotics and Automation (ICRA). 1462–1468.

[4]
Eric Brachmann, Alexander Krull, Sebastian Nowozin, Jamie Shotton, Frank Michel, Stefan Gumhold, and Carsten Rother. 2017. Dsac-differentiable ransac for camera localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6684–6692.

[5]
Eric Brachmann and Carsten Rother. 2021. Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE transactions on pattern analysis and machine intelligence 44, 9 (2021), 5847–5865.

[6]
Marco Callieri, Andrea Fasano, Gaetano Impoco, Paolo Cignoni, Roberto Scopigno, G Parrini, and Giuseppe Biagini. 2004. RoboScan: an automatic system for accurate and unattended 3D scanning. In Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004. IEEE, 805–812.

[7]
Avraham Cohen and Moshe Shoham. 2016. Application of hyper-dual numbers to multibody kinematics. Journal of Mechanisms and Robotics 8, 1 (2016), 011015.

[8]
Angela Dai, Matthias Nie?ner, Michael Zollh?fer, Shahram Izadi, and Christian Theobalt. 2017. BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration. ACM Trans. Graph. 36, 4, Article 76a (jul 2017), 18 pages.

[9]
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 224–236.

[10]
Jeffrey Fike and Juan Alonso. 2011. The development of hyper-dual numbers for exact second-derivative calculations. In 49th AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition. 886.

[11]
Huan Fu, Rongfei Jia, Lin Gao, Mingming Gong, Binqiang Zhao, Steve Maybank, and Dacheng Tao. 2021. 3d-future: 3d furniture shape with texture. International Journal of Computer Vision (2021), 1–25.

[12]
H?ctor H. Gonz?lez-Ba?os and Jean-Claude Latombe. 2002. Navigation Strategies for Exploring Indoor Environments. The International Journal of Robotics Research 21, 10–11 (2002), 829–848.

[13]
Sai Krishna Gottipati, Keehong Seo, Dhaivat Bhatt, Vincent Mai, Krishna Murthy, and Liam Paull. 2019. Deep active localization. IEEE Robotics and Automation Letters 4, 4 (2019), 4394–4401.

[14]
Can G?meli, Angela Dai, and Matthias Nie?ner. 2023. ObjectMatch: Robust Registration using Canonical Object Correspondences. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13082–13091.

[15]
Ankur Handa, Thomas Whelan, John McDonald, and Andrew J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In 2014 IEEE International Conference on Robotics and Automation (ICRA). 1524–1531.

[16]
Nicholas J Higham. 2002. Accuracy and stability of numerical algorithms. SIAM.

[17]
Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B Tenenbaum, William T Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. 2019. Chainqueen: A real-time differentiable physical simulator for soft robotics. In 2019 International conference on robotics and automation (ICRA). IEEE, 6265–6271.

[18]
Shi-Sheng Huang, Ze-Yu Ma, Tai-Jiang Mu, Hongbo Fu, and Shi-Min Hu. 2021a. Supervoxel Convolution for Online 3D Semantic Segmentation. ACM Transactions on Graphics 40, 3 (Aug. 2021), 34:1–34:15.

[19]
Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei Zhou, Hujun Bao, Guofeng Zhang, and Hongsheng Li. 2021b. Vs-net: Voting with segmentation for visual localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6101–6111.

[20]
Krishna Murthy Jatavallabhula, Ganesh Iyer, and Liam Paull. 2020. ?SLAM: Dense SLAM meets Automatic Differentiation. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 2130–2137.

[21]
Peter Karkus, Shaojun Cai, and David Hsu. 2021. Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation. arXiv:2105.07593 [cs, stat]

[22]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization., 2938–2946 pages.

[23]
Gregory Lantoine, Ryan P Russell, and Thierry Dargent. 2012. Using multicomplex variables for automatic computation of high-order derivatives. ACM Transactions on Mathematical Software (TOMS) 38, 3 (2012), 16.

[24]
Xinyi Li and Haibin Ling. 2022. GTCaR: Graph Transformer for Camera Re-Localization. In Computer Vision – ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Ciss?, Giovanni Maria Farinella, and Tal Hassner (Eds.). Vol. 13670. Springer Nature Switzerland, Cham, 229–246.

[25]
Ligang Liu, Xi Xia, Han Sun, Qi Shen, Juzhan Xu, Bin Chen, Hui Huang, and Kai Xu. 2018. Object-aware guidance for autonomous scene reconstruction. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–12.

[26]
Leyao Liu, Tian Zheng, Yun-Jou Lin, Kai Ni, and Lu Fang. 2022. INS-Conv: Incremental Sparse Convolution for Online 3D Segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18953–18962.

[27]
Ran Luo, Weiwei Xu, Tianjia Shao, Hongyi Xu, and Yin Yang. 2019. Accelerated complex-step finite difference for expedient deformable simulation. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–16.

[28]
Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang, Raquel Urtasun, and Antonio Torralba. 2022. Virtual Correspondence: Humans as a Cue for Extreme-View Geometry. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15903–15913.

[29]
Joaquim RRA Martins, Peter Sturdza, and Juan J Alonso. 2003. The complex-step derivative approximation. ACM Transactions on Mathematical Software (TOMS) 29, 3 (2003), 245–262.

[30]
John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger. 2017. SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA). 4628–4635.

[31]
Raul Mur-Artal, J. M. M. Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics 31, 5 (Oct. 2015), 1147–1163. arXiv:1502.00956 [cs]

[32]
Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgibbon. 2011. Kinectfusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality. Ieee, 127–136.

[33]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch?-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

[34]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.

[35]
Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H.J. Kelly, and Andrew J. Davison. 2013. SLAM++: Simultaneous Localisation and Mapping at the Level of Objects. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 1352–1359.

[36]
Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. 2019. From Coarse to Fine: Robust Hierarchical Localization at Large Scale. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12708–12717.

[37]
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2020. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4938–4947.

[38]
Paul-Edouard Sarlin, Ajaykumar Unagar, Mans Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, et al. 2021. Back to the feature: Learning robust camera localization from pixels to pose. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3247–3257.

[39]
Torsten Sattler, Bastian Leibe, and Leif Kobbelt. 2011. Fast Image-Based Localization Using Direct 2D-to-3D Matching. In 2011 International Conference on Computer Vision. 667–674.

[40]
Siyuan Shen, Yin Yang, Tianjia Shao, He Wang, Chenfanfu Jiang, Lei Lan, and Kun Zhou. 2021. High-order differentiable autoencoder for nonlinear model reduction. ACM Trans. Graph. 40, 4 (2021), 68:1–68:15.

[41]
Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. 2930–2937.

[42]
Erik Stenborg, Carl Toft, and Lars Hammarstrand. 2018. Long-Term Visual Localization Using Semantically Segmented Images. arXiv:1801.05269 [cs]

[43]
J?rgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 573–580.

[44]
Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, and Ping Tan. 2021. Learning camera localization via dense scene matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1831–1841.

[45]
Carlo Tomasi and Roberto Manduchi. 1998. Bilateral filtering for gray and color images. In Sixth international conference on computer vision (IEEE Cat. No. 98CH36271). IEEE, 839–846.

[46]
Bing Wang, Changhao Chen, Chris Xiaoxuan Lu, Peijun Zhao, Niki Trigoni, and Andrew Markham. 2020. Atloc: Attention guided camera localization., 10393–10401 pages.

[47]
Thomas Whelan, Stefan Leutenegger, Renato F Salas-Moreno, Ben Glocker, and Andrew J Davison. 2015. ElasticFusion: Dense SLAM without a pose graph. In Robotics: science and systems, Vol. 11. Rome, Italy, 3.

[48]
Yanmin Wu, Yunzhou Zhang, Delong Zhu, Xin Chen, Sonya Coleman, Wenkai Sun, Xinggang Hu, and Zhiqiang Deng. 2021. Object slam-based active mapping and robotic grasping. In 2021 International Conference on 3D Vision (3DV). IEEE, 1372–1381.

[49]
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1912–1920.

[50]
Zhe Xin, Yinghao Cai, Tao Lu, Xiaoxia Xing, Shaojun Cai, Jixiang Zhang, Yiping Yang, and Yanqing Wang. 2019. Localizing discriminative visual landmarks for place recognition. In 2019 International conference on robotics and automation (ICRA). IEEE, 5979–5985.

[51]
Kai Xu, Lintao Zheng, Zihao Yan, Guohang Yan, Eugene Zhang, Matthias Niessner, Oliver Deussen, Daniel Cohen-Or, and Hui Huang. 2017. Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields. ACM Trans. Graph. 36, 6, Article 202 (nov 2017), 15 pages.

[52]
Yabin Xu, Liangliang Nan, Laishui Zhou, Jun Wang, and Charlie C. L. Wang. 2022. HRBF-Fusion: Accurate 3D Reconstruction from RGB-D Data Using On-the-fly Implicits. ACM Transactions on Graphics 41, 3 (June 2022), 1–19.

[53]
Rui Zeng, Yuhui Wen, Wang Zhao, and Yong-Jin Liu. 2020. View planning in robot active vision: A survey of systems, algorithms, and applications. Computational Visual Media 6 (2020), 225–245.

[54]
Liang Zhang, Leqi Wei, Peiyi Shen, Wei Wei, Guangming Zhu, and Juan Song. 2018. Semantic SLAM Based on Object Detection and Improved Octomap. IEEE Access 6 (2018), 75545–75559.

[55]
Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, and Kai Xu. 2019. Active Scene Understanding via Online Semantic Reconstruction. Computer Graphics Forum 38, 7 (2019), 103–114. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13820

ACM Digital Library Publication:

X-SLAM: Scalable Dense SLAM for Task-aware Optimization Using CSFD

Overview Page:

SIGGRAPH 2024: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“X-SLAM: Scalable Dense SLAM for Task-aware Optimization Using CSFD”

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: