Egocentric scene reconstruction from an omnidirectional video

Omnidirectional videos capture environmental scenes effectively, but they have rarely been used for geometry reconstruction. In this work, we propose an egocentric 3D reconstruction method that can acquire scene geometry with high accuracy from a short egocentric omnidirectional video. To this end, we first estimate per-frame depth using a spherical disparity network. We then fuse per-frame depth estimates into a novel spherical binoctree data structure that is specifically designed to tolerate spherical depth estimation errors. By subdividing the spherical space into binary tree and octree nodes that represent spherical frustums adaptively, the spherical binoctree effectively enables egocentric surface geometry reconstruction for environmental scenes while simultaneously assigning high-resolution nodes for closely observed surfaces. This allows to reconstruct an entire scene from a short video captured with a small camera trajectory. Experimental results validate the effectiveness and accuracy of our approach for reconstructing the 3D geometry of environmental scenes from short egocentric omnidirectional video inputs. We further demonstrate various applications using a conventional omnidirectional camera, including novel-view synthesis, object insertion, and relighting of scenes using reconstructed 3D models with texture.

References:

1. Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt. 2020. OmniPhotos: Casual 360° VR Photography. ACM Trans. Graph. 39, 6 (2020), 267:1–12. Google ScholarDigital Library
2. Blender Online Community. 2022. Blender – a 3D modelling and rendering package. Blender Foundation. https://www.blender.org/Google Scholar
3. Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. 303–312. Google ScholarDigital Library
4. Marc Eder, Pierre Moulon, and Li Guan. 2019. Pano Popups: Indoor 3D Reconstruction with a Plane-Aware Network. In 3DV. 76–84. Google ScholarCross Ref
5. Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Trans. Graph. 36, 6 (2017), 234:1–15. Google ScholarDigital Library
6. Peter Hedman and Johannes Kopf. 2018. Instant 3D Photography. ACM Trans. Graph. 37, 4 (2018), 101:1–12. Google ScholarDigital Library
7. Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable Inside-Out Image-Based Rendering. ACM Trans. Graph. 35, 6 (2016), 231:1–11. Google ScholarDigital Library
8. Daniel Hernandez-Juarez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure, and Antonio M. López. 2016. Embedded Real-time Stereo Estimation via Semi-Global Matching on the GPU. In International Conference on Computational Science. 143–153. Google ScholarDigital Library
9. Heiko Hirschmüller. 2008. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Trans. Pattern Anal. 30, 2 (2008), 328–341. Google ScholarDigital Library
10. Sunghoon Im, Hyowon Ha, François Rameau, Hae-Gon Jeon, Gyeongmin Choe, and In So Kweon. 2016. All-around Depth from Small Motion with A Spherical Panoramic Camera. In ECCV. Google ScholarCross Ref
11. Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Push-meet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. 2011. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In UIST. 559–568. Google ScholarDigital Library
12. Hualie Jiang, Zhe Sheng, Siyu Zhu, Zilong Dong, and Rui Huang. 2021. UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation. IEEE Robotics and Automation Letters 6, 2 (2021), 1519–1526. Google ScholarCross Ref
13. Lei Jin, Yanyu Xu, Jia Zheng, Junfei Zhang, Rui Tang, Shugong Xu, Jingyi Yu, and Shenghua Gao. 2020. Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery. In CVPR. 886–895. Google ScholarCross Ref
14. Sing Bing Kang and Richard Szeliski. 1997. 3-D Scene Data Recovery Using Omnidirectional Multibaseline Stereo. Int. J. Comput. Vis. 25, 2 (1997), 167–183. Google ScholarDigital Library
15. Michael Kazhdan and Hugues Hoppe. 2013. Screened Poisson Surface Reconstruction. ACM Trans. Graph. 32, 3 (2013), 29:1–13. Google ScholarDigital Library
16. Hansung Kim and Adrian Hilton. 2013. 3D Scene Reconstruction from Multiple Spherical Stereo Pairs. Int. J. Comput. Vis. 104, 1 (2013), 94–116. Google ScholarDigital Library
17. Ren Komatsu, Hiromitsu Fujii, Yusuke Tamura, Atsushi Yamashita, and Hajime Asama. 2020. 360° Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron. In IROS. Google ScholarDigital Library
18. Tilman Kühner and Julius Kümmerle. 2020. Large-Scale Volumetric Scene Reconstruction using LiDAR. In ICRA. 6261–6267. Google ScholarCross Ref
19. Po Kong Lai, Shuang Xie, Jochen Lang, and Robert Laganière. 2019. Real-time panoramic depth maps from omni-directional stereo images for 6 DoF videos in virtual reality. In IEEE VR. 405–412. Google ScholarCross Ref
20. Joo Ho Lee, Hyunho Ha, Yue Dong, Xin Tong, and Min H. Kim. 2020. TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning. In CVPR. 1272–1280. Google ScholarCross Ref
21. Shigang Li. 2008. Binocular Spherical Stereo. IEEE Transactions on Intelligent Transportation Systems 9, 4 (2008), 589–600. Google ScholarDigital Library
22. Vadim Litvinov and Maxime Lhuillier. 2013. Incremental Solid Modeling from Sparse and Omnidirectional Structure-from-Motion Data. In BMVC.Google Scholar
23. William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. Computer Graphics 21, 4 (1987), 163–169. Google ScholarDigital Library
24. Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. 2020. Consistent Video Depth Estimation. ACM Trans. Graph. 39, 4 (2020), 71:1–13. Google ScholarDigital Library
25. Bruno Lévy, Sylvain Petitjean, Nicolas Ray, and Jérome Maillot. 2002. Least Squares Conformal Maps for Automatic Texture Atlas Generation. ACM Trans. Graph. 21, 3 (2002), 362–371. Google ScholarDigital Library
26. Kevin Matzen, Michael F. Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski. 2017. Low-cost 360 Stereo Photography and Video Capture. ACM Trans. Graph. 36, 4 (2017), 148:1–12. Google ScholarDigital Library
27. Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/dataGoogle Scholar
28. Andréas Meuleman, Hyeonjoong Jang, Daniel S. Jeon, and Min H. Kim. 2021. Real-Time Sphere Sweeping Stereo from Multiview Fisheye Images. In CVPR. Google ScholarCross Ref
29. Pierre Moulon, Pascal Monasse, Romuald Perrot, and Renaud Marlet. 2016. OpenMVG: Open multiple view geometry. In International Workshop on Reproducible Research in Pattern Recognition. 60–74. Google ScholarCross Ref
30. Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Realtime 3D Reconstruction at Scale Using Voxel Hashing. ACM Trans. Graph. 32, 6 (2013), 169:1–11. Google ScholarDigital Library
31. Ryan Styles Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A System for Acquiring, Compressing, and Rendering Panoramic Light Field Stills for Virtual Reality. ACM Trans. Graph. 37, 6 (2018), 197:1–15. Google ScholarDigital Library
32. Albert Parra Pozo, Michael Toksvig, Terry Filiba Schrager, Joyse Hsu, Uday Mathur, Alexander Sorkine-Hornung, Rick Szeliski, and Brian Cabral. 2019. An Integrated 6DoF Video Camera and System Design. ACM Trans. Graph. 38, 6 (2019), 216:1–16. Google ScholarDigital Library
33. Giovanni Pintore, Marco Agus, Eva Almansa, Jens Schneider, and Enrico Gobbetti. 2021. SliceNet: Deep Dense Depth Estimation From a Single Indoor Panorama Using a Slice-Based Representation. In CVPR. 11531–11540. Google ScholarCross Ref
34. Marc Pollefeys, Luc Van Gool, Maarten Vergauwen, Frank Verbiest, Kurt Cornelis, Jan Tops, and Reinhard Koch. 2004. Visual Modeling with a Hand-Held Camera. Int. J. Comput. Vis. 59, 3 (2004), 207–232. Google ScholarDigital Library
35. Pedro V. Sander, Steven J. Gortler, John Snyder, and Hugues Hoppe. 2002. Signal-Specialized Parametrization. In Eurographics Workshop on Rendering. 87–98.Google Scholar
36. Scott Schaefer and Joe Warren. 2005. Dual Marching Cubes: Primal Contouring of Dual Grids. Comput. Graph. Forum 24, 2 (2005), 195–201. Google ScholarCross Ref
37. Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR. 4104–4113. Google ScholarCross Ref
38. Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In ECCV. 501–518. Google ScholarCross Ref
39. Ana Serrano, Incheol Kim, Zhili Chen, Stephen DiVerdi, Diego Gutierrez, Aaron Hertzmann, and Belen Masia. 2019. Motion parallax for 360° RGBD video. IEEE Trans. Vis. Comput. Graph. 25, 5 (2019), 1817–1827. Google ScholarCross Ref
40. Shinya Sumikura, Mikiya Shibuya, and Ken Sakurada. 2019. OpenVSLAM: a Versatile Visual SLAM Framework. In International Conference on Multimedia. Google ScholarDigital Library
41. Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2021. HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features. In CVPR. 2573–2582. Google ScholarCross Ref
42. Zachary Teed and Jia Deng. 2020. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In ECCV. Google ScholarDigital Library
43. Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, and Min Sun. 2018. Self-Supervised Learning of Depth and Camera Motion from 360° Videos. In ACCV. Google ScholarCross Ref
44. Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, and Yi-Hsuan Tsai. 2020b. BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion. In CVPR. 462–471. Google ScholarCross Ref
45. Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, and Min Sun. 2020a. 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume. In ICRA. 582–588. Google ScholarCross Ref
46. Katja Wolff, Changil Kim, Henning Zimmer, Christopher Schroers, Mario Botsch, Olga Sorkine-Hornung, and Alexander Sorkine-Hornung. 2016. Point Cloud Noise and Outlier Removal for Image-Based 3D Reconstruction. In 3DV. 118–127. Google ScholarCross Ref
47. Changhee Won, Jongbin Ryu, and Jongwoo Lim. 2019a. OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching. In ICCV. 8986–8995. Google ScholarCross Ref
48. Changhee Won, Jongbin Ryu, and Jongwoo Lim. 2019b. SweepNet: Wide-baseline Omnidirectional Depth Estimation. In ICRA. Google ScholarDigital Library
49. Changhee Won, Hochang Seok, Zhaopeng Cui, Marc Pollefeys, and Jongwoo Lim. 2020. OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems. In ICRA. 559–566. Google ScholarCross Ref
50. Ming Zeng, Fukai Zhao, Jiaxiang Zheng, and Xinguo Liu. 2013. Octree-based fusion for realtime 3D reconstruction. Graphical Models 75, 3 (2013), 126–136. Google ScholarDigital Library
51. Wei Zeng, Sezer Karaoglu, and Theo Gevers. 2020. Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image. In ECCV. Google ScholarDigital Library
52. Jianing Zhang, Tianyi Zhu, Anke Zhang, Xiaoyun Yuan, Zihan Wang, Sebastian Beetschen, Lan Xu, Xing Lin, Qionghai Dai, and Lu Fang. 2020. Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality. In ICCP. Google ScholarCross Ref
53. Kun Zhou, John Synder, Baining Guo, and Heung-Yeung Shum. 2004. Iso-Charts: Stretch-Driven Mesh Parameterization Using Spectral Analysis. In Symposium on Geometry Processing (SGP). 45–54. Google ScholarDigital Library
54. Qian-Yi Zhou and Vladlen Koltun. 2014. Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras. ACM Trans. Graph. 33, 4 (2014), 155:1–10. Google ScholarDigital Library
55. Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, and Petros Daras. 2019. Spherical View Synthesis for Self-Supervised 360° Depth Estimation. In 3DV. 690–699. Google ScholarCross Ref
56. Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. 2018. OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas. In ECCV. 448–465. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2022: Technical Papers

“Egocentric scene reconstruction from an omnidirectional video” by Jang, Meuleman, Kang, Kim, Richardt, et al. …

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: