Acorn: adaptive coordinate networks for neural scene representation

Neural representations have emerged as a new paradigm for applications in rendering, imaging, geometric modeling, and simulation. Compared to traditional representations such as meshes, point clouds, or volumes they can be flexibly incorporated into differentiable learning-based pipelines. While recent improvements to neural representations now make it possible to represent signals with fine details at moderate resolutions (e.g., for images and 3D shapes), adequately representing large-scale or complex scenes has proven a challenge. Current neural representations fail to accurately represent images at resolutions greater than a megapixel or 3D scenes with more than a few hundred thousand polygons. Here, we introduce a new hybrid implicit-explicit network architecture and training strategy that adaptively allocates resources during training and inference based on the local complexity of a signal of interest. Our approach uses a multiscale block-coordinate decomposition, similar to a quadtree or octree, that is optimized during training. The network architecture operates in two stages: using the bulk of the network parameters, a coordinate encoder generates a feature grid in a single forward pass. Then, hundreds or thousands of samples within each block can be efficiently evaluated using a lightweight feature decoder. With this hybrid implicit-explicit network architecture, we demonstrate the first experiments that fit gigapixel images to nearly 40 dB peak signal-to-noise ratio. Notably this represents an increase in scale of over 1000X compared to the resolution of previously demonstrated image-fitting experiments. Moreover, our approach is able to represent 3D shapes significantly faster and better than previous techniques; it reduces training times from days to hours or minutes and memory requirements by over an order of magnitude.

References:

1. Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, and James Tompkin. 2020. MatryODShka: Real-time 6DoF video view synthesis using multi-sphere images. In Proc. ECCV.Google ScholarDigital Library
2. Matan Atzmon and Yaron Lipman. 2020. SAL: Sign agnostic learning of shapes from raw data. In Proc. CVPR.Google ScholarCross Ref
3. Marsha J. Berger and Joseph Oliger. 1984. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of computational Physics 53, 3 (1984), 484–512.Google ScholarCross Ref
4. Michael Broxton, John Flynn, Ryan Overbeck, Daniel Erickson, Peter Hedman, Matthew Duvall, Jason Dourgarian, Jay Busch, Matt Whalen, and Paul Debevec. 2020. Immersive light field video with a layered mesh representation. ACM Trans. Graph. (SIGGRAPH) 39, 4 (2020).Google ScholarDigital Library
5. Rohan Chabra, Jan Eric Lenssen, Eddy Ilg, Tanner Schmidt, Julian Straub, Steven Lovegrove, and Richard Newcombe. 2020. Deep local shapes: Learning local SDF priors for detailed 3D reconstruction. In Proc. ECCV.Google ScholarDigital Library
6. Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In Proc. CVPR.Google ScholarCross Ref
7. Yinbo Chen, Sifei Liu, and Xiaolong Wang. 2021. Learning continuous image representation with local implicit image function. In Proc. CVPR.Google ScholarCross Ref
8. Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proc. CVPR.Google ScholarCross Ref
9. Thomas Davies, Derek Nowrouzezahrai, and Alec Jacobson. 2020. Overfit neural networks as a compact shape representation. arXiv preprint arXiv:2009.09808 (2020).Google Scholar
10. S. M. Ali Eslami, Danilo Jimenez Rezende, Frederic Besse, Fabio Viola, Ari S. Morcos, Marta Garnelo, Avraham Ruderman, Andrei A. Rusu, Ivo Danihelka, Karol Gregor, David P. Reichert, Lars Buesing, Theophane Weber, Oriol Vinyals, Dan Rosenbaum, Neil Rabinowitz, Helen King, Chloe Hillier, Matt Botvinick, Daan Wierstra, Koray Kavukcuoglu, and Demis Hassabis. 2018. Neural scene representation and rendering. Science 360, 6394 (2018), 1204–1210.Google Scholar
11. John Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, and Richard Tucker. 2019. Deepview: View synthesis with learned gradient descent. In Proc. CVPR.Google ScholarCross Ref
12. Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, and Thomas Funkhouser. 2020. Local deep implicit functions for 3D shape. In Proc. CVPR.Google ScholarCross Ref
13. Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. 2020. Implicit geometric regularization for learning shapes. In Proc. ICML.Google Scholar
14. LLC Gurobi Optimization. 2021. Gurobi Optimizer Reference Manual. http://www.gurobi.comGoogle Scholar
15. Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proc. ICLR.Google Scholar
16. Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. (SIGGRAPH Asia) 37, 6 (2018).Google Scholar
17. Philipp Henzler, Niloy J. Mitra, and Tobias Ritschel. 2019. Escaping Plato’s cave: 3D shape from adversarial rendering. In Proc. ICCV.Google ScholarCross Ref
18. Weizhang Huang and Robert D. Russell. 2010. Adaptive Moving Mesh Methods. Springer New York.Google Scholar
19. Chiyu Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Nießner, and Thomas Funkhouser. 2020b. Local implicit grid representations for 3D scenes. In Proc. CVPR.Google ScholarCross Ref
20. Yue Jiang, Dantong Ji, Zhizhong Han, and Matthias Zwicker. 2020a. SDFDiff: Differentiate rendering of signed distance fields for 3D shape optimization. In Proc. CVPR.Google ScholarCross Ref
21. Michael Kazhdan and Hugues Hoppe. 2013. Screened poisson surface reconstruction. ACM Trans. Graph. 32, 3 (2013).Google ScholarDigital Library
22. Petr Kellnhofer, Lars Jebe, Andrew Jones, Ryan Spicer, Kari Pulli, and Gordon Wetzstein. 2021. Neural Lumigraph Rendering. In CVPR.Google Scholar
23. Byungsoo Kim, Vinicius C. Azevedo, Nils Thuerey, Theodore Kim, Markus Gross, and Barbara Solenthaler. 2019. Deep fluids: A generative network for parameterized fluid simulations. Computer Graphics Forum 38, 2 (2019), 59–70.Google ScholarCross Ref
24. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proc. ICLR.Google Scholar
25. Amit Kohli, Vincent Sitzmann, and Gordon Wetzstein. 2020. Semantic implicit neural scene representations with semi-supervised training. Proc. 3DV (2020).Google ScholarCross Ref
26. Stanford Computer Graphics Laboratory. 2014. Stanford 3D Scanning Repository. http://graphics.stanford.edu/data/3Dscanrep/Google Scholar
27. Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. 2020. Fourier neural operator for parametric partial differential equations. In Proc. NeurIPS.Google Scholar
28. Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. 2020a. Neural sparse voxel fields. In NeurIPS.Google Scholar
29. Shaohui Liu, Yinda Zhang, Songyou Peng, Boxin Shi, Marc Pollefeys, and Zhaopeng Cui. 2020b. DIST: Rendering deep implicit signed distance function with differentiable sphere tracing. In Proc. CVPR.Google ScholarCross Ref
30. Stephen Lombardi, Tomas Simon, Jason Saragih, Gabriel Schwartz, Andreas Lehrmann, and Yaser Sheikh. 2019. Neural volumes: Learning dynamic renderable volumes from images. ACM Trans. Graph (SIGGRAPH) 38, 4 (2019).Google ScholarDigital Library
31. William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. ACM Siggraph Computer Graphics 21, 4 (1987), 163–169.Google ScholarDigital Library
32. Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. 2019. Occupancy networks: Learning 3D reconstruction in function space. In Proc. CVPR.Google ScholarCross Ref
33. Mateusz Michalkiewicz, Jhony K. Pontes, Dominic Jack, Mahsa Baktashmotlagh, and Anders Eriksson. 2019. Implicit surface representations as layers in neural networks. In Proc. ICCV.Google ScholarCross Ref
34. Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (SIGGRAPH) 38, 4 (2019).Google ScholarDigital Library
35. Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proc. ECCV.Google ScholarDigital Library
36. Thu Nguyen-Phuoc, Christian Richardt, Long Mai, Yong-Liang Yang, and Niloy Mitra. 2020. BlockGAN: Learning 3D object-aware scene representations from unlabelled images. In Proc. NeurIPS.Google Scholar
37. Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. 2020a. Differentiable volumetric rendering: Learning implicit 3D representations without 3D supervision. In Proc. CVPR.Google ScholarCross Ref
38. Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. 2020b. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR.Google Scholar
39. Michael Oechsle, Lars Mescheder, Michael Niemeyer, Thilo Strauss, and Andreas Geiger. 2019. Texture fields: Learning texture representations in function space. In Proc. ICCV.Google ScholarCross Ref
40. Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Love-grove. 2019. DeepSDF: Learning continuous signed distance functions for shape representation. In Proc. CVPR.Google ScholarCross Ref
41. Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. 2020. Convolutional occupancy networks. In Proc. ECCV.Google ScholarDigital Library
42. Gernot Riegler and Vladlen Koltun. 2020. Free view synthesis. In Proc. ECCV.Google ScholarDigital Library
43. Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proc. ICCV.Google ScholarCross Ref
44. Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. GRAF: Generative radiance fields for 3D-aware image synthesis. In Proc. NeurIPS.Google Scholar
45. Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. In Proc. NeurIPS.Google Scholar
46. Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and Michael Zollhöfer. 2019a. DeepVoxels: Learning persistent 3D feature embeddings. In Proc. CVPR.Google ScholarCross Ref
47. Vincent Sitzmann, Michael Zollhöfer, and Gordon Wetzstein. 2019b. Scene representation networks: Continuous 3D-structure-aware neural scene representations. In Proc. NeurIPS.Google Scholar
48. Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler. 2021. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In Proc. CVPR.Google ScholarCross Ref
49. Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. 2020. Fourier features let networks learn high frequency functions in low dimensional domains. Proc. NeurIPS (2020).Google Scholar
50. Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner, et al. 2020. State of the art on neural rendering. Proc. Eurographics (2020).Google ScholarCross Ref
51. Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. (SIGGRAPH) 38, 4 (2019), 1–12.Google ScholarDigital Library
52. Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Ronen Basri, and Yaron Lipman. 2020. Multiview neural surface reconstruction by disentangling geometry and appearance. In Proc. NeurIPS.Google Scholar
53. Xiuming Zhang, Sean Fanello, Yun-Ta Tsai, Tiancheng Sun, Tianfan Xue, Rohit Pandey, Sergio Orts-Escolano, Philip Davidson, Christoph Rhemann, Paul Debevec, Jonathan T. Barron, Ravi Ramamoorthi, and William T. Freeman. 2021. Neural light transport for relighting and view synthesis. ACM Trans. Graph. 40, 1 (2021).Google ScholarDigital Library
54. Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo magnification: Learning view synthesis using multiplane images. ACM Trans. Graph. (SIGGRAPH) 37, 4 (2018).Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2021: Technical Papers

“Acorn: adaptive coordinate networks for neural scene representation” by Martel, Lindell, Lin, Chan, Monteiro, et al. …

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: