“Scalable Neural Indoor Scene Rendering” by Wu, Xu, Zhu, Bao, Huang, et al. …

  • ©Xiuchao Wu, Jiamin Xu, Zihan Zhu, Hujun Bao, Qixing Huang, James Tompkin, and Weiwei Xu



    Scalable Neural Indoor Scene Rendering

Program Title:

    Labs Demo



    We propose a scalable neural scene reconstruction and rendering method to support distributed training and interactive rendering of large indoor scenes. Our representation is based on tiles. Tile appearances are trained in parallel through a background sampling strategy that augments each tile with distant scene information via a proxy global mesh. Each tile has two low-capacity MLPs: one for view-independent appearance (diffuse color and shading) and one for view-dependent appearance (specular highlights, reflections). We leverage the phenomena that complex view-dependent scene reflections can be attributed to virtual lights underneath surfaces at the total ray distance to the source. This lets us handle sparse samplings of the input scene where reflection highlights do not always appear consistently in input images. We show interactive free-viewpoint rendering results from five scenes, one of which covers an area of more than 100 m2. Experimental results show that our method produces higher-quality renderings than a single large-capacity MLP and five recent neural proxy-geometry and voxel-based baseline methods. Our code and data are available at project webpage https://xchaowu.github.io/papers/scalable-nisr.


    1. J. Amanatides, A. Woo, et al. 1987. A fast voxel traversal algorithm for ray tracing.. In Eurographics, Vol. 87. 3–10.
    2. B. Attal, E. Laidlaw, A. Gokaslan, C. Kim, C. Richardt, J. Tompkin, and M. O’Toole. 2021. TöRF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis. arXiv:2109.15271 [cs.CV]
    3. J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv:2103.13415 [cs.CV]
    4. D. Bau, H. Strobelt, W. Peebles, J. Wulff, B. Zhou, J. Zhu, and A. Torralba. 2020. Semantic photo manipulation with a generative image prior. arXiv preprint arXiv:2005.07727 (2020).
    5. S. Bi, Z. Xu, P. Srinivasan, B. Mildenhall, K. Sulkavalli, M. Hašan, Y. Hold-Geoffroy, D. Kriegman, and R. Ramamoorthi. 2020. Neural Reflectance Fields for Appearance Acquisition. https://arxiv.org/abs/2008.03824 (2020).
    6. M. Boss, R. Braun, V. Jampani, J. T. Barron, C. Liu, and H. Lensch. 2021. NeRD: Neural Reflectance Decomposition from Image Collections. In ICCV.
    7. C. Buehler, M. Bosse, L. McMillan, S. Gortler, and M. Cohen. 2001. Unstructured lumigraph rendering. In SIGGRAPH, ACM. 425–432.
    8. CapturingReality. 2016. Reality capture, http://capturingreality.com.
    9. D. Casas, C. Richardt, J. Collomosse, C. Theobalt, and A. Hilton. 2015. 4D Model Flow: Precomputed Appearance Alignment for Real-time 4D Video Interpolation. Computer Graphics Forum Journal of the European Association for Computer Graphics (2015).
    10. E. Chan, M. Monteiro, P. Kellnhofer, J. Wu, and G. Wetzstein. 2021. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In CVPR.
    11. G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. 32, 3 (2013), 1–12.
    12. S. E. Chen and L. Williams. 1993. View Interpolation for Image Synthesis. In SIGGRAPH, ACM. 279–288.
    13. D. Cohen-Steiner, P. Alliez, and M. Desbrun. 2004. Variational shape approximation. In SIGGRAPH, ACM. 905–914.
    14. P. Debevec, Y. Yu, and G. Borshukov. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In Eurographics. Springer, 105–116.
    15. R. Du, M. Chuang, W. Chang, H. Hoppe, and A. Varshney. 2019. Montage4D: Real-Time Seamless Fusion and Stylization of Multiview Video Textures. Journal of Computer Graphics Techniques 8, 1 (2019), 1–34. 
    16. M. Eisemann, B. De Decker, M. Magnor, P. Bekaert, E. De Aguiar, N. Ahmed, C. Theobalt, and A. Sellent. 2008. Floating Textures. (2008).
    17. J. Flynn, M. Broxton, P. Debevec, M. DuVall, G. Fyffe, R. Overbeck, N. Snavely, and R. Tucker. 2019. Deepview: View synthesis with learned gradient descent. In CVPR. 2367–2376.
    18. G. Gafni, J. Thies, M. Zollhöfer, and M. Nießner. 2021. Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction. In CVPR.
    19. S. J. Garbin, M. Kowalski, M. Johnson, J. Shotton, and J. Valentin. 2021. FastNeRF: High-Fidelity Neural Rendering at 200FPS. In ICCV.
    20. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. 1996. The lumigraph. In SIGGRAPH, ACM. 43–54.
    21. Y. Guo, D. Kang, L. Bao, Y. He, and S. Zhang. 2022. NeRFReN: Neural Radiance Fields with Reflections. In CVPR.
    22. P. Hedman, J. Philip, T. Price, J. M. Frahm, G. Drettakis, and G. Brostow. 2018. Deep blending for free-viewpoint image-based rendering. 37, 6 (2018), 1–15.
    23. P. Hedman, T. Ritschel, G. Drettakis, and G. Brostow. 2016. Scalable inside-out image-based rendering. 35, 6 (2016), 1–11.
    24. P. Hedman, P. P. Srinivasan, B. Mildenhall, J. T. Barron, and P. Debevec. 2021. Baking Neural Radiance Fields for Real-Time View Synthesis. In ICCV.
    25. M. Jancosek and T. Pajdla. 2011. Multi-view reconstruction preserving weakly-supported surfaces. In CVPR. IEEE, 3121–3128.
    26. J. Johnson, A. Alahi, and L. Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In ECCV (Lecture Notes in Computer Science, Vol. 9906), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, 694–711. 
    27. T. Karras, S. Laine, and T. Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401–4410.
    28. D. Kingma and J. Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2015).
    29. J. Kopf, F. Langguth, D. Scharstein, R. Szeliski, and M. Goesele. 2013. Image-based rendering in the gradient domain. 32, 6 (2013), 1–9.
    30. P. Labatut, J. Pons, and R. Keriven. 2007. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In ICCV. IEEE, 1–8.
    31. M. Levoy and P. Hanrahan. 1996. Light field rendering. In SIGGRAPH, ACM. 31–42.
    32. C. Lin, W. Ma, A. Torralba, and S. Lucey. 2021. BARF: Bundle-Adjusting Neural Radiance Fields. In ICCV.
    33. D. B. Lindell, J. N. P. Martel, and G. Wetzstein. 2021. AutoInt: Automatic Integration for Fast Neural Volume Rendering. In CVPR.
    34. L. Liu, J. Gu, K. Lin, T. Chua, and C. Theobalt. 2020a. Neural sparse voxel fields. In NeurIPS, Vol. 33.
    35. L. Liu, J. Gu, K. Z. Lin, T. S. Chua, and C. Theobalt. 2020b. Neural Sparse Voxel Fields. NeurIPS (2020).
    36. L. Liu, M. Habermann, V. Rudnev, K. Sarkar, J. Gu, and C. Theobalt. 2021. Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control. ACM SIGGRAPH Asia (2021).
    37. L. Liu, W. Xu, M. Zollhoefer, H. Kim, F. Bernard, M. Habermann, W. Wang, and C. Theobalt. 2019. Neural rendering and reenactment of human actor videos. 38, 5 (2019), 1–14.
    38. S. Lombardi, T. Simon, J. Saragih, G. Schwartz, A. Lehrmann, and Y. Sheikh. 2019. Neural Volumes: Learning Dynamic Renderable Volumes from Images. (2019).
    39. S. Lombardi, T. Simon, G. Schwartz, M. Zollhoefer, Y. Sheikh, and J. Saragih. 2021. Mixture of Volumetric Primitives for Efficient Neural Rendering. arXiv:2103.01954 [cs.GR]
    40. R. Martin-Brualla, N. Radwan, M. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth. 2021. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR.
    41. W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and L. McMillan. 2000. Image-Based Visual Hulls. In SIGGRAPH, ACM. 6 pages.
    42. W. Matusik, H. Pfister, M. Brand, and L. McMillan. 2003. A Data-Driven Reflectance Model. ACM Transactions on Graphics 22, 3 (2003), 759–769.
    43. W. Matusik, H. Pfister, A. Ngan, P. Beardsley, R. Ziegler, and L. Mcmillan. 2002. Image-Based 3D Photography Using Opacity Hulls. 21, 3 (2002), 427–437.
    44. N. Max. 1995. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics 1, 2 (1995), 99–108. 
    45. A. Meka, C. Haene, R. Pandey, M. Zollhöfer, S. Fanello, G. Fyffe, A. Kowdle, X. Yu, J. Busch, J. Dourgarian, et al. 2019. Deep reflectance fields: high-quality facial reflectance field inference from color gradient illumination. 38, 4 (2019), 1–12.
    46. B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. 2019. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. 38, 4 (2019), 1–14.
    47. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and N. Ren. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
    48. D. Miller and Rubin. 1998. Lazy decompression of surface light fields for precomputed global illumination. Springer Vienna (1998).
    49. C. Müller. 1966. Spherical harmonics / Claus Müller. Springer-Verlag Berlin ; New York. 45 p. : pages.
    50. T. Müller, A. Evans, C. Schied, and A. Keller. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. arXiv:2201.05989 (2022).
    51. M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger. 2019. Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In CVPR.
    52. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. 32, 6 (2013), 1–11.
    53. Nvidia. 2017–2018. Nvidia Corporation. TensorRT. https://developer.nvidia.com/tensorrt.
    54. R. Pandey, A. Tkach, S. Yang, P. Pidlypenskyi, J. Taylor, R. Martin-Brualla, A. Tagliasacchi, G. Papandreou, P. Davidson, C. Keskin, et al. 2019. Volumetric capture of humans with a single rgbd camera via semi-parametric learning. In CVPR. 9709–9718.
    55. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove. 2019a. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In CVPR. 165–174.
    56. K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. Goldman, S. Seitz, and R. Martin-Brualla. 2020. Deformable Neural Radiance Fields. In ICCV.
    57. K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, R. Martin-Brualla, and S. M. Seitz. 2021. HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields. 40, 6, Article 238 (2021).
    58. T. Park, M. Liu, T. Wang, and J. Zhu. 2019b. Semantic image synthesis with spatially-adaptive normalization. In CVPR. 2337–2346.
    59. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS, H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett (Eds.). 8024–8035.
    60. S. Peng, J. Dong, Q. Wang, S. Zhang, Q. Shuai, H. Bao, and X. Zhou. 2021a. Animatable Neural Radiance Fields for Human Body Modeling. In ICCV.
    61. S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou. 2021b. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR.
    62. E. Penner and L. Zhang. 2017. Soft 3D reconstruction for view synthesis. 36, 6 (2017), 1–11.
    63. J. Philip, M. Gharbi, T. Zhou, A. A. Efros, and G. Drettakis. 2019. Multi-view relighting using a geometry-aware network. 38, 4 (2019), 1–14.
    64. J. Philip, S. Morgenthaler, M. Gharbi, and G. Drettakis. 2021. Free-viewpoint Indoor Neural Relighting from Multi-view Stereo. ACM Transactions on Graphics (2021).
    65. A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer. 2021. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In CVPR.
    66. C. Reiser, S. Peng, Y. Liao, and A. Geiger. 2021. KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs. arXiv:2103.13744 [cs.CV]
    67. K. Rematas, R. Martin-Brualla, and V. Ferrari. 2021. ShaRF: Shape-conditioned Radiance Fields from a Single View. arXiv preprint arXiv:2102.08860 (2021).
    68. G. Riegler and V. Koltun. 2020. Free View Synthesis. In ECCV.
    69. G. Riegler and V. Koltun. 2021. Stable View Synthesis. In CVPR.
    70. S. Rodriguez, S. Prakash, P. Hedman, and G. Drettakis. 2020. Image-Based Rendering of Cars using Semantic Labels and Approximate Reflection Flow. Proc. ACM Comput. Graph. Interact. 3 (2020).
    71. O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention.
    72. J. L. Schonberger and J. M. Frahm. 2016. Structure-from-Motion Revisited. In CVPR. 4104–4113.
    73. K. Schwarz, Y. Liao, M. Niemeyer, and A. Geiger. 2020. Graf: Generative radiance fields for 3D-aware image synthesis. In NeurIPS, Vol. 33.
    74. H. Y. Shum and S. B. Kang. 2000. A Review of Image-based Rendering Techniques. Microsoft.
    75. K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.
    76. S. N. Sinha, J. Kopf, M. Goesele, D. Scharstein, and R. Szeliski. 2012. Image-based rendering for scenes with reflections. 31, 4 (2012), 1–10.
    77. V. Sitzmann, S. Rezchikov, B. Freeman, J. Tenenbaum, and F. Durand. 2021. Light field networks: Neural scene representations with single-evaluation rendering. Advances in Neural Information Processing Systems 34 (2021).
    78. V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhofer. 2019a. Deepvoxels: Learning persistent 3d feature embeddings. In CVPR. 2437–2446.
    79. V. Sitzmann, M. Zollhöfer, and G. Wetzstein. 2019b. Scene representation networks: Continuous 3d-structure-aware neural scene representations. In NeurIPS. 1121–1132.
    80. P. Srinivasan, B. Deng, X. Zhang, M. Tancik, B. Mildenhall, and J. T. Barron. 2021. NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis. In CVPR.
    81. P. P. Srinivasan, R. Tucker, J. T. Barron, R. Ramamoorthi, R. Ng, and N. Snavely. 2019. Pushing the boundaries of view extrapolation with multiplane images. In CVPR. 175–184.
    82. E. Sucar, S. Liu, J. Ortiz, and A. Davison. 2021. iMAP: Implicit Mapping and Positioning in Real-Time. In ICCV.
    83. C. Sun, M. Sun, and H. Chen. 2021. Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction. arXiv preprint arXiv:2111.11215 (2021).
    84. R. Szeliski and P. Golland. 1999. Stereo matching with transparency and matting. International Journal of Computer Vision 32, 1 (1999), 45–61.
    85. A. Tewari, O. Fried, J. Thies, V. Sitzmann, S. Lombardi, K. Sunkavalli, R. Martin-Brualla, T. Simon, J. Saragih, M. Nießner, et al. 2020. State of the art on neural rendering. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 701–727.
    86. A. Tewari, J. Thies, B. Mildenhall, P. Srinivasan, E. Tretschk, Y. Wang, C. Lassner, V. Sitzmann, R. Martin-Brualla, S. Lombardi, T. Simon, C. Theobalt, M. Niessner, J. T. Barron, G. Wetzstein, M. Zollhoefer, and V. Golyanik. 2021. Advances in Neural Rendering. arXiv:2111.05849 [cs.GR]
    87. J. Thies, M. Zollhöfer, and M. Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. 38, 4 (2019), 1–12.
    88. A. Trevithick and B. Yang. 2020. GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering. In ICCV.
    89. D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan. 2022. Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. In CVPR.
    90. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image Quality Assessment: From Error Visibility to Structural Similarity. Trans. Img. Proc. 13, 4 (2004), 600–612. 
    91. Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu. 2021. NeRF-: Neural Radiance Fields Without Known Camera Parameters. arXiv preprint arXiv:2102.07064 (2021).
    92. S. Wizadwongsa, P. Phongthawee, J. Yenphraphai, and S. Suwajanakorn. 2021. Nex: Real-time view synthesis with neural basis expansion. In CVPR. 8534–8543.
    93. D. N. Wood, D. I. Azuma, K. Aldinger, B. Curless, T. Duchamp, D. H. Salesin, and W. Stuetzle. 2000. Surface light fields for 3D photography. In SIGGRAPH, ACM. 287–296.
    94. Y. Xie, T. Takikawa, S. Saito, Or Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V. Sitzmann, and S. Sridhar. 2022. Neural Fields in Visual Computing and Beyond. Computer Graphics Forum (2022). 
    95. J. Xu, X. Wu, Z. Zhu, Q. Huang, Y. Yang, H. Bao, and W. Xu. 2021. Scalable Image-Based Indoor Scene Rendering with Reflections. 40, 4, Article 60 (2021), 14 pages. 
    96. L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T. Lin. 2021. iNeRF: Inverting Neural Radiance Fields for Pose Estimation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
    97. A. Yu, S. Fridovich-Keil, M.Tancik, Q. Chen, B. Recht, and A. Kanazawa. 2022. Plenoxels: Radiance Fields without Neural Networks. In CVPR.
    98. A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa. 2021a. Plenoctrees for real-time rendering of neural radiance fields. In ICCV. 5752–5761.
    99. A. Yu, V. Ye, M. Tancik, and A. Kanazawa. 2021b. pixelNeRF: Neural Radiance Fields from One or Few Images. In CVPR.
    100. C. Zhang and T. Chen. 2003. A survey on image-based rendering. Signal Processing Image Communication 19 (2003), 1–28.
    101. K. Zhang, F. Luan, Q. Wang, K. Bala, and N. Snavely. 2021a. PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In CVPR. 5453–5462.
    102. K. Zhang, G. Riegler, N. Snavely, and V. Koltun. 2020. Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492 (2020).
    103. X. Zhang, P. P. Srinivasan, B. Deng, P. Debevec, W. T. Freeman, and J. T. Barron. 2021b. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. 40, 6 (2021), 1–18.
    104. H. Zhou, S. Hadap, K. Sunkavalli, and D. W. Jacobs. 2019. Deep single-image portrait relighting. In ICCV. 7194–7202.
    105. T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. 2018. Stereo magnification: learning view synthesis using multiplane images. 37, 4 (2018), 1–12.
    106. Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys. 2022. NICE-SLAM: Neural Implicit Scalable Encoding for SLAM. In CVPR.

ACM Digital Library Publication: