“Scalable neural indoor scene rendering” by Wu, Xu, Zhu, Bao, Huang, et al. …
Conference:
Type(s):
Title:
- Scalable neural indoor scene rendering
Presenter(s)/Author(s):
Abstract:
We propose a scalable neural scene reconstruction and rendering method to support distributed training and interactive rendering of large indoor scenes. Our representation is based on tiles. Tile appearances are trained in parallel through a background sampling strategy that augments each tile with distant scene information via a proxy global mesh. Each tile has two low-capacity MLPs: one for view-independent appearance (diffuse color and shading) and one for view-dependent appearance (specular highlights, reflections). We leverage the phenomena that complex view-dependent scene reflections can be attributed to virtual lights underneath surfaces at the total ray distance to the source. This lets us handle sparse samplings of the input scene where reflection highlights do not always appear consistently in input images. We show interactive free-viewpoint rendering results from five scenes, one of which covers an area of more than 100 m2. Experimental results show that our method produces higher-quality renderings than a single large-capacity MLP and five recent neural proxy-geometry and voxel-based baseline methods. Our code and data are available at project webpage https://xchaowu.github.io/papers/scalable-nisr.
References:
1. J. Amanatides, A. Woo, et al. 1987. A fast voxel traversal algorithm for ray tracing.. In Eurographics, Vol. 87. 3–10.Google Scholar
2. B. Attal, E. Laidlaw, A. Gokaslan, C. Kim, C. Richardt, J. Tompkin, and M. O’Toole. 2021. TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis. arXiv:2109.15271 [cs.CV]Google Scholar
3. J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv:2103.13415 [cs.CV]Google Scholar
4. D. Bau, H. Strobelt, W. Peebles, J. Wulff, B. Zhou, J. Zhu, and A. Torralba. 2020. Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727 (2020).Google Scholar
5. S. Bi, Z. Xu, P. Srinivasan, B. Mildenhall, K. Sulkavalli, M. Hašan, Y. Hold-Geoffroy, D. Kriegman, and R. Ramamoorthi. 2020. Neural Reflectance Fields for Appearance Acquisition. https://arxiv.org/abs/2008.03824 (2020).Google Scholar
6. M. Boss, R. Braun, V. Jampani, J. T. Barron, C. Liu, and H. Lensch. 2021. NeRD: Neural Reflectance Decomposition from Image Collections. In ICCV.Google Scholar
7. C. Buehler, M. Bosse, L. McMillan, S. Gortler, and M. Cohen. 2001. Unstructured lumigraph rendering. In SIGGRAPH, ACM. 425–432.Google Scholar
8. CapturingReality. 2016. Reality capture, http://capturingreality.com.Google Scholar
9. D. Casas, C. Richardt, J. Collomosse, C. Theobalt, and A. Hilton. 2015. 4D Model Flow: Precomputed Appearance Alignment for Real-time 4D Video Interpolation. Computer Graphics Forum Journal of the European Association for Computer Graphics (2015).Google Scholar
10. E. Chan, M. Monteiro, P. Kellnhofer, J. Wu, and G. Wetzstein. 2021. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In CVPR.Google Scholar
11. G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. 32, 3 (2013), 1–12.Google Scholar
12. S. E. Chen and L. Williams. 1993. View Interpolation for Image Synthesis. In SIGGRAPH, ACM. 279–288.Google Scholar
13. D. Cohen-Steiner, P. Alliez, and M. Desbrun. 2004. Variational shape approximation. In SIGGRAPH, ACM. 905–914.Google Scholar
14. P. Debevec, Y. Yu, and G. Borshukov. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In Eurographics. Springer, 105–116.Google Scholar
15. R. Du, M. Chuang, W. Chang, H. Hoppe, and A. Varshney. 2019. Montage4D: Real-Time Seamless Fusion and Stylization of Multiview Video Textures. Journal of Computer Graphics Techniques 8, 1 (2019), 1–34. Google ScholarCross Ref
16. M. Eisemann, B. De Decker, M. Magnor, P. Bekaert, E. De Aguiar, N. Ahmed, C. Theobalt, and A. Sellent. 2008. Floating Textures. (2008).Google Scholar
17. J. Flynn, M. Broxton, P. Debevec, M. DuVall, G. Fyffe, R. Overbeck, N. Snavely, and R. Tucker. 2019. Deepview: View synthesis with learned gradient descent. In CVPR. 2367–2376.Google Scholar
18. G. Gafni, J. Thies, M. Zollhöfer, and M. Nießner. 2021. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In CVPR.Google Scholar
19. S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. Valentin. 2021. FastNeRF: High-Fidelity Neural Rendering at 200FPS. In ICCV.Google Scholar
20. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. 1996. The lumigraph. In SIGGRAPH, ACM. 43–54.Google Scholar
21. Y. Guo, D. Kang, L. Bao, Y. He, and S. Zhang. 2022. NeRFReN: Neural Radiance Fields with Reflections. In CVPR.Google Scholar
22. P. Hedman, J. Philip, T. Price, J. M. Frahm, G. Drettakis, and G. Brostow. 2018. Deep blending for free-viewpoint image-based rendering. 37, 6 (2018), 1–15.Google Scholar
23. P. Hedman, T. Ritschel, G. Drettakis, and G. Brostow. 2016. Scalable inside-out image-based rendering. 35, 6 (2016), 1–11.Google Scholar
24. P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec. 2021. Baking Neural Radiance Fields for Real-Time View Synthesis. In ICCV.Google Scholar
25. M. Jancosek and T. Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In CVPR. IEEE, 3121–3128.Google Scholar
26. J. Johnson, A. Alahi, and L. Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In ECCV (Lecture Notes in Computer Science, Vol. 9906), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, 694–711. Google ScholarCross Ref
27. T. Karras, S. Laine, and T. Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401–4410.Google Scholar
28. D. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).Google Scholar
29. J. Kopf, F. Langguth, D. Scharstein, R. Szeliski, and M. Goesele. 2013. Image-based rendering in the gradient domain. 32, 6 (2013), 1–9.Google Scholar
30. P. Labatut, J. Pons, and R. Keriven. 2007. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In ICCV. IEEE, 1–8.Google Scholar
31. M. Levoy and P. Hanrahan. 1996. Light field rendering. In SIGGRAPH, ACM. 31–42.Google Scholar
32. C. Lin, W. Ma, A. Torralba, and S. Lucey. 2021. BARF: Bundle-Adjusting Neural Radiance Fields. In ICCV.Google Scholar
33. D. B. Lindell, J. N. P. Martel, and G. Wetzstein. 2021. AutoInt: Automatic Integration for Fast Neural Volume Rendering. In CVPR.Google Scholar
34. L. Liu, J. Gu, K. Lin, T. Chua, and C. Theobalt. 2020a. Neural sparse voxel fields. In NeurIPS, Vol. 33.Google Scholar
35. L. Liu, J. Gu, K. Z. Lin, T. S. Chua, and C. Theobalt. 2020b. Neural Sparse Voxel Fields. NeurIPS (2020).Google Scholar
36. L. Liu, M. Habermann, V. Rudnev, K. Sarkar, J. Gu, and C. Theobalt. 2021. Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control. ACM SIGGRAPH Asia (2021).Google ScholarDigital Library
37. L. Liu, W. Xu, M. Zollhoefer, H. Kim, F. Bernard, M. Habermann, W. Wang, and C. Theobalt. 2019. Neural rendering and reenactment of human actor videos. 38, 5 (2019), 1–14.Google Scholar
38. S. Lombardi, T. Simon, J. Saragih, G. Schwartz, A. Lehrmann, and Y. Sheikh. 2019. Neural Volumes: Learning Dynamic Renderable Volumes from Images. (2019).Google ScholarDigital Library
39. S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and J. Saragih. 2021. Mixture of Volumetric Primitives for Efficient Neural Rendering. arXiv:2103.01954 [cs.GR]Google Scholar
40. R. Martin-Brualla, N. Radwan, M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. 2021. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR.Google Scholar
41. W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and L. McMillan. 2000. Image-Based Visual Hulls. In SIGGRAPH, ACM. 6 pages.Google Scholar
42. W. Matusik, H. Pfister, M. Brand, and L. McMillan. 2003. A Data-Driven Reflectance Model. ACM Transactions on Graphics 22, 3 (2003), 759–769.Google ScholarDigital Library
43. W. Matusik, H. Pfister, A. Ngan, P. Beardsley, R. Ziegler, and L. Mcmillan. 2002. Image-Based 3D Photography Using Opacity Hulls. 21, 3 (2002), 427–437.Google Scholar
44. N. Max. 1995. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics 1, 2 (1995), 99–108. Google ScholarDigital Library
45. A. Meka, C. Haene, R. Pandey, M. Zollhöfer, S. Fanello, G. Fyffe, A. Kowdle, X. Yu, J. Busch, J. Dourgarian, et al. 2019. Deep reflectance fields: high-quality facial reflectance field inference from color gradient illumination. 38, 4 (2019), 1–12.Google Scholar
46. B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. 2019. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. 38, 4 (2019), 1–14.Google Scholar
47. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and N. Ren. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.Google Scholar
48. D. Miller and Rubin. 1998. Lazy decompression of surface light fields for precomputed global illumination. Springer Vienna (1998).Google ScholarCross Ref
49. C. Müller. 1966. Spherical harmonics / Claus Müller. Springer-Verlag Berlin ; New York. 45 p. : pages.Google Scholar
50. T. Müller, A. Evans, C. Schied, and A. Keller. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. arXiv:2201.05989 (2022).Google Scholar
51. M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger. 2019. Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In CVPR.Google Scholar
52. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. 32, 6 (2013), 1–11.Google Scholar
53. Nvidia. 2017–2018. Nvidia Corporation. TensorRT. https://developer.nvidia.com/tensorrt.Google Scholar
54. R. Pandey, A. Tkach, S. Yang, P. Pidlypenskyi, J. Taylor, R. Martin-Brualla, A. Tagliasacchi, G. Papandreou, P. Davidson, C. Keskin, et al. 2019. Volumetric capture of humans with a single rgbd camera via semi-parametric learning. In CVPR. 9709–9718.Google Scholar
55. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. 2019a. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In CVPR. 165–174.Google Scholar
56. K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. Goldman, S. Seitz, and R. Martin-Brualla. 2020. Deformable Neural Radiance Fields. In ICCV.Google Scholar
57. K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, R. Martin-Brualla, and S. M. Seitz. 2021. HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields. 40, 6, Article 238 (2021).Google Scholar
58. T. Park, M. Liu, T. Wang, and J. Zhu. 2019b. Semantic image synthesis with spatially-adaptive normalization. In CVPR. 2337–2346.Google Scholar
59. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.). 8024–8035.Google Scholar
60. S. Peng, J. Dong, Q. Wang, S. Zhang, Q. Shuai, H. Bao, and X. Zhou. 2021a. Animatable Neural Radiance Fields for Human Body Modeling. In ICCV.Google Scholar
61. S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou. 2021b. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR.Google Scholar
62. E. Penner and L. Zhang. 2017. Soft 3D reconstruction for view synthesis. 36, 6 (2017), 1–11.Google Scholar
63. J. Philip, M. Gharbi, T. Zhou, A. A. Efros, and G. Drettakis. 2019. Multi-view relighting using a geometry-aware network. 38, 4 (2019), 1–14.Google Scholar
64. J. Philip, S. Morgenthaler, M. Gharbi, and G. Drettakis. 2021. Free-viewpoint Indoor Neural Relighting from Multi-view Stereo. ACM Transactions on Graphics (2021).Google Scholar
65. A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer. 2021. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In CVPR.Google Scholar
66. C. Reiser, S. Peng, Y. Liao, and A. Geiger. 2021. KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs. arXiv:2103.13744 [cs.CV]Google Scholar
67. K. Rematas, R. Martin-Brualla, and V. Ferrari. 2021. ShaRF: Shape-conditioned Radiance Fields from a Single View. arXiv preprint arXiv:2102.08860 (2021).Google Scholar
68. G. Riegler and V. Koltun. 2020. Free View Synthesis. In ECCV.Google Scholar
69. G. Riegler and V. Koltun. 2021. Stable View Synthesis. In CVPR.Google Scholar
70. S. Rodriguez, S. Prakash, P. Hedman, and G. Drettakis. 2020. Image-Based Rendering of Cars using Semantic Labels and Approximate Reflection Flow. Proc. ACM Comput. Graph. Interact. 3 (2020).Google Scholar
71. O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention.Google Scholar
72. J. L. Schonberger and J. M. Frahm. 2016. Structure-from-Motion Revisited. In CVPR. 4104–4113.Google Scholar
73. K. Schwarz, Y. Liao, M. Niemeyer, and A. Geiger. 2020. Graf: Generative radiance fields for 3D-aware image synthesis. In NeurIPS, Vol. 33.Google Scholar
74. H. Y. Shum and S. B. Kang. 2000. A Review of Image-based Rendering Techniques. Microsoft.Google Scholar
75. K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.Google Scholar
76. S. N. Sinha, J. Kopf, M. Goesele, D. Scharstein, and R. Szeliski. 2012. Image-based rendering for scenes with reflections. 31, 4 (2012), 1–10.Google Scholar
77. V. Sitzmann, S. Rezchikov, B. Freeman, J. Tenenbaum, and F. Durand. 2021. Light field networks: Neural scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
78. V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhofer. 2019a. Deepvoxels: Learning persistent 3d feature embeddings. In CVPR. 2437–2446.Google Scholar
79. V. Sitzmann, M. Zollhöfer, and G. Wetzstein. 2019b. Scene representation networks: Continuous 3d-structure-aware neural scene representations. In NeurIPS. 1121–1132.Google Scholar
80. P. Srinivasan, B. Deng, X. Zhang, M. Tancik, B. Mildenhall, and J. T. Barron. 2021. NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis. In CVPR.Google Scholar
81. P. P. Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely. 2019. Pushing the boundaries of view extrapolation with multiplane images. In CVPR. 175–184.Google Scholar
82. E. Sucar, S. Liu, J. Ortiz, and A. Davison. 2021. iMAP: Implicit Mapping and Positioning in Real-Time. In ICCV.Google Scholar
83. C. Sun, M. Sun, and H. Chen. 2021. Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction. arXiv preprint arXiv:2111.11215 (2021).Google Scholar
84. R. Szeliski and P. Golland. 1999. Stereo matching with transparency and matting. International Journal of Computer Vision 32, 1 (1999), 45–61.Google ScholarDigital Library
85. A. Tewari, O. Fried, J. Thies, V. Sitzmann, S. Lombardi, K. Sunkavalli, R. Martin-Brualla, T. Simon, J. Saragih, M. Nießner, et al. 2020. State of the art on neural rendering. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 701–727.Google ScholarCross Ref
86. A. Tewari, J. Thies, B. Mildenhall, P. Srinivasan, E. Tretschk, Y. Wang, C. Lassner, V. Sitzmann, R. Martin-Brualla, S. Lombardi, T. Simon, C. Theobalt, M. Niessner, J. T. Barron, G. Wetzstein, M. Zollhoefer, and V. Golyanik. 2021. Advances in Neural Rendering. arXiv:2111.05849 [cs.GR]Google Scholar
87. J. Thies, M. Zollhöfer, and M. Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. 38, 4 (2019), 1–12.Google Scholar
88. A. Trevithick and B. Yang. 2020. GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering. In ICCV.Google Scholar
89. D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan. 2022. Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In CVPR.Google Scholar
90. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image Quality Assessment: From Error Visibility to Structural Similarity. Trans. Img. Proc. 13, 4 (2004), 600–612. Google ScholarDigital Library
91. Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu. 2021. NeRF-: Neural Radiance Fields Without Known Camera Parameters. arXiv preprint arXiv:2102.07064 (2021).Google Scholar
92. S. Wizadwongsa, P. Phongthawee, J. Yenphraphai, and S. Suwajanakorn. 2021. Nex: Real-time view synthesis with neural basis expansion. In CVPR. 8534–8543.Google Scholar
93. D. N. Wood, D. I. Azuma, K. Aldinger, B. Curless, T. Duchamp, D. H. Salesin, and W. Stuetzle. 2000. Surface light fields for 3D photography. In SIGGRAPH, ACM. 287–296.Google Scholar
94. Y. Xie, T. Takikawa, S. Saito, Or Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V. Sitzmann, and S. Sridhar. 2022. Neural Fields in Visual Computing and Beyond. Computer Graphics Forum (2022). Google ScholarCross Ref
95. J. Xu, X. Wu, Z. Zhu, Q. Huang, Y. Yang, H. Bao, and W. Xu. 2021. Scalable Image-Based Indoor Scene Rendering with Reflections. 40, 4, Article 60 (2021), 14 pages. Google ScholarDigital Library
96. L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T. Lin. 2021. iNeRF: Inverting Neural Radiance Fields for Pose Estimation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
97. A. Yu, S. Fridovich-Keil, M.Tancik, Q. Chen, B. Recht, and A. Kanazawa. 2022. Plenoxels: Radiance Fields without Neural Networks. In CVPR.Google Scholar
98. A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa. 2021a. Plenoctrees for real-time rendering of neural radiance fields. In ICCV. 5752–5761.Google Scholar
99. A. Yu, V. Ye, M. Tancik, and A. Kanazawa. 2021b. pixelNeRF: Neural Radiance Fields from One or Few Images. In CVPR.Google Scholar
100. C. Zhang and T. Chen. 2003. A survey on image-based rendering. Signal Processing Image Communication 19 (2003), 1–28.Google ScholarCross Ref
101. K. Zhang, F. Luan, Q. Wang, K. Bala, and N. Snavely. 2021a. PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In CVPR. 5453–5462.Google Scholar
102. K. Zhang, G. Riegler, N. Snavely, and V. Koltun. 2020. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492 (2020).Google Scholar
103. X. Zhang, P. P. Srinivasan, B. Deng, P. Debevec, W. T. Freeman, and J. T. Barron. 2021b. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. 40, 6 (2021), 1–18.Google ScholarDigital Library
104. H. Zhou, S. Hadap, K. Sunkavalli, and D. W. Jacobs. 2019. Deep single-image portrait relighting. In ICCV. 7194–7202.Google Scholar
105. T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. 2018. Stereo magnification: learning view synthesis using multiplane images. 37, 4 (2018), 1–12.Google Scholar
106. Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys. 2022. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM. In CVPR.Google Scholar