Modeling Ambient Scene Dynamics for Free-view Synthesis

We introduce an innovative method for view synthesis of dynamic scenes from one monocular video, enhancing immersion by overcoming previous limitations of 3D Gaussian Splatting. By exploiting ambient motion periodicity and regularization, our approach reconstructs detailed natural scenes with motions better than others, significantly improving quality for photorealistic viewing experiences.

References:

[1]
Aayush Bansal, Minh Vo, Yaser Sheikh, Deva Ramanan, and Srinivasa Narasimhan. 2020. 4D Visualization of Dynamic Events from Unconstrained Multi-View Videos. In CVPR.

[2]
Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. 2021. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855?5864.

[3]
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. 2022. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. CVPR (2022).

[4]
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. 2023. Zip-NeRF: Anti-aliased grid-based neural radiance fields. arXiv preprint arXiv:2304.06706 (2023).

[5]
Michael Broxton, John Flynn, Ryan Overbeck, Daniel Erickson, Peter Hedman, Matthew Duvall, Jason Dourgarian, Jay Busch, Matt Whalen, and Paul Debevec. 2020. Immersive light field video with a layered mesh representation. ACM TOG 39, 4 (2020), 86?1.

[6]
Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques. 425?432.

[7]
Ang Cao and Justin Johnson. 2023. HexPlane: A Fast Representation for Dynamic Scenes. CVPR (2023).

[8]
Rodrigo Ortiz Cayon, Abdelaziz Djelouah, and George Drettakis. 2015. A bayesian approach for selective image-based rendering using superpixels. In 2015 International Conference on 3D Vision. IEEE, 469?477.

[9]
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (ToG) 34, 4 (2015), 1?13.

[10]
Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, and Jan Eric Lenssen. 2023. Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction. arXiv preprint arXiv:2312.01196 (2023).

[11]
Abe Davis, Justin G Chen, and Fr?do Durand. 2015. Image-space modal bases for plausible manipulation of objects in video. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1?7.

[12]
Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. 1996. Modeling and Rendering Architecture from Photographs:. Technical Report. USA.

[13]
Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B Tenenbaum, and Jiajun Wu. 2021. Neural radiance flow for 4d view synthesis and video processing. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 14304?14314.

[14]
Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nie?ner, and Qi Tian. 2022. Fast Dynamic Radiance Fields with Time-Aware Neural Voxels. In SIGGRAPH Asia 2022 Conference Papers.

[15]
John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker. 2019. Deepview: View synthesis with learned gradient descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2367?2376.

[16]
Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahb?k Warburg, Benjamin Recht, and Angjoo Kanazawa. 2023. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12479?12488.

[17]
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. 2022. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5501?5510.

[18]
Chen Gao, Ayush Saraf, Johannes Kopf, and Jia-Bin Huang. 2021. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5712?5721.

[19]
Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ?96). Association for Computing Machinery, New York, NY, USA, 43?54. https://doi.org/10.1145/237170.237200

[20]
Xiang Guo, Guanying Chen, Yuchao Dai, Xiaoqing Ye, Jiadai Sun, Xiao Tan, and Errui Ding. 2022. Neural Deformable Voxel Grid for Fast Optimization of Dynamic View Synthesis. In Proceedings of the Asian Conference on Computer Vision (ACCV).

[21]
MARC LEVOY Pat Hanrahan. 1996. Light field rendering. SIGGRAPH96, Conputer Graphics Proceeding (1996).

[22]
Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG) 37, 6 (2018), 1?15.

[23]
B. Heigl, R. Koch, M. Pollefeys, J. Denzler, and L. Van Gool. 1999. Plenoptic Modeling and Rendering from Image Sequences Taken by a Hand-Held Camera. In Mustererkennung 1999, Wolfgang F?rstner, Joachim M. Buhmann, Annett Faber, and Petko Faber (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 94?101.

[24]
Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. 2023. SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes. arXiv preprint arXiv:2312.14937 (2023).

[25]
Mustafa I??k, Martin R?nz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, and Matthias Nie?ner. 2023. Humanrf: High-fidelity neural radiance fields for humans in motion. arXiv preprint arXiv:2305.06356 (2023).

[26]
Ramesh Jain and Koji Wakimoto. 1995. Multiple perspective interactive video. In Proceedings of the international conference on multimedia computing and systems. IEEE, 202?211.

[27]
Takeo Kanade, Peter Rander, and PJ Narayanan. 1997. Virtualized reality: Constructing virtual worlds from real scenes. IEEE multimedia 4, 1 (1997), 34?47.

[28]
Kai Katsumata, Duc Minh Vo, and Hideki Nakayama. 2023. An Efficient 3D Gaussian Representation for Monocular/Multi-view Dynamic Scenes. arXiv preprint arXiv:2311.12897 (2023).

[29]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk?hler, and George Drettakis. 2023a. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (July 2023). https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

[30]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk?hler, and George Drettakis. 2023b. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (2023).

[31]
Johannes Kopf, Michael F Cohen, and Richard Szeliski. 2014. First-person hyper-lapse videos. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1?10.

[32]
Agelos Kratimenos, Jiahui Lei, and Kostas Daniilidis. 2023. DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting. arXiv preprint arXiv:2312.00112 (2023).

[33]
Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. 2023. Compact 3D Gaussian Representation for Radiance Field. arXiv preprint arXiv:2311.13681 (2023).

[34]
Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Ping Tan. 2022a. Streaming radiance fields for 3d video synthesis. Advances in Neural Information Processing Systems 35 (2022), 13485?13498.

[35]
Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, 2022b. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5521?5531.

[36]
Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. 2023a. Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis. arXiv preprint arXiv:2312.16812 (2023).

[37]
Zhong Li, Yu Ji, Wei Yang, Jinwei Ye, and Jingyi Yu. 2017. Robust 3D human motion reconstruction via dynamic template construction. In 2017 International Conference on 3D Vision (3DV). IEEE, 496?505.

[38]
Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. 2021. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6498?6508.

[39]
Zhengqi Li, Richard Tucker, Noah Snavely, and Aleksander Holynski. 2023b. Generative image dynamics. arXiv preprint arXiv:2309.07906 (2023).

[40]
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, and Noah Snavely. 2023c. Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4273?4284.

[41]
Zhong Li, Minye Wu, Wangyiteng Zhou, and Jingyi Yu. 2018. 4D Human Body Correspondences from Panoramic Depth Maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2877?2886.

[42]
Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, and Lei Xiao. 2023. GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis. arXiv preprint arXiv:2312.11458 (2023).

[43]
Haotong Lin, Sida Peng, Zhen Xu, Tao Xie, Xingyi He, Hujun Bao, and Xiaowei Zhou. 2023b. Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes. arXiv preprint arXiv:2310.08585 (2023).

[44]
Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. 2023a. Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle. arXiv preprint arXiv:2312.03431 (2023).

[45]
Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Johannes Kopf, and Jia-Bin Huang. 2023. Robust Dynamic Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]
Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. 2024. Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV.

[47]
Li Ma, Xiaoyu Li, Jing Liao, and Pedro V. Sander. 2023. 3D Video Loops from Asynchronous Input. https://doi.org/10.48550/ARXIV.2303.05312

[48]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.

[49]
Thomas M?ller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph. 41, 4, Article 102 (July 2022), 15 pages. https://doi.org/10.1145/3528223.3530127

[50]
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, 2016. Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th annual symposium on user interface software and technology. 741?754.

[51]
Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. 2021a. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5865?5874.

[52]
Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M Seitz. 2021b. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228 (2021).

[53]
Automne Petitjean, Yohan Poirier-Ginter, Ayush Tewari, Guillaume Cordonnier, and George Drettakis. 2023. ModalNeRF: Neural Modal Analysis and Synthesis for Free-Viewpoint Navigation in Dynamically Vibrating Scenes. In Computer Graphics Forum, Vol. 42.

[54]
Johannes L Schonberger and Jan-Michael Frahm. 2016. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4104?4113.

[55]
Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, and Yebin Liu. 2023a. Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor. arXiv preprint arXiv:2305.20082 (2023).

[56]
Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, and Yebin Liu. 2023b. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16632?16642.

[57]
Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. 2023. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2732?2742.

[58]
Pratul P Srinivasan, Richard Tucker, Jonathan T Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. 2019. Pushing the boundaries of view extrapolation with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 175?184.

[59]
Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2022. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5459?5469.

[60]
Th?o Thonat, Yagiz Aksoy, Miika Aittala, Sylvain Paris, Fr?do Durand, and George Drettakis. 2021. Video-Based Rendering of Dynamic Stationary Environments from Unsynchronized Inputs. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 73?86.

[61]
Chaoyang Wang, Ben Eckart, Simon Lucey, and Orazio Gallo. 2021. Neural trajectory fields for dynamic novel view synthesis. arXiv preprint arXiv:2105.05994 (2021).

[62]
Feng Wang, Sinan Tan, Xinghang Li, Zeyue Tian, Yafei Song, and Huaping Liu. 2023. Mixed neural voxels for fast multi-view video synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 19706?19716.

[63]
Liao Wang, Jiakai Zhang, Xinhang Liu, Fuqiang Zhao, Yanshun Zhang, Yingliang Zhang, Minye Wu, Jingyi Yu, and Lan Xu. 2022. Fourier plenoctrees for dynamic radiance field rendering in real-time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13524?13534.

[64]
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Wang Xinggang. 2023. 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering. arXiv preprint arXiv:2310.08528 (2023).

[65]
Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. 2022. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5438?5448.

[66]
Jason C Yang, Matthew Everett, Chris Buehler, and Leonard McMillan. 2002. A real-time distributed light field camera.Rendering Techniques 2002, 77-86 (2002), 2.

[67]
Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. 2023a. Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction. arXiv preprint arXiv:2309.13101 (2023).

[68]
Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, and Li Zhang. 2023b. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting. arXiv preprint arXiv 2310.10642 (2023).

[69]
Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, and Li Zhang. 2023c. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv:2310.10642 (2023).

[70]
Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. 2021. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5752?5761.

[71]
Heng Yu, Joel Julin, Zolt?n ? Milacski, Koichiro Niinuma, and L?szl? A Jeni. 2023. CoGS: Controllable Gaussian Splatting. arXiv preprint arXiv:2312.05664 (2023).

[72]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.

[73]
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018).

ACM Digital Library Publication:

Modeling Ambient Scene Dynamics for Free-view Synthesis

Overview Page:

SIGGRAPH 2024: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Modeling Ambient Scene Dynamics for Free-view Synthesis” by Shih, Huang, Kim, Shah and Kopf

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: