“MVD^2: Efficient Multiview 3D Reconstruction for Multiview Diffusion”
Conference:
Type(s):
Title:
- MVD^2: Efficient Multiview 3D Reconstruction for Multiview Diffusion
Presenter(s)/Author(s):
Abstract:
Multiview diffusion (MVD) has emerged as a prominent 3D generation technique, but faces challenges with inconsistency and view sparseness, impacting the quality of multiview 3D reconstruction. Our learning-based MVD^2 method tackles these challenges, ensuring efficient and robust 3D reconstruction with superior quality from various multiview diffusion images.
References:
[1]
Stability AI. 2023. Stable Zero123. https://huggingface.co/stabilityai/stable-zero123.
[2]
Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. 2023a. Fantasia3D: Disentangling geometry and appearance for high-quality text-to-3D content creation. In ICCV.
[3]
Yang Chen, Yingwei Pan, Yehao Li, Ting Yao, and Tao Mei. 2023b. Control3D: Towards controllable text-to-3D generation. In ACM Multimedia. 1148?1156.
[4]
Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In CVPR. 5939?5948.
[5]
Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander G Schwing, and Liang-Yan Gui. 2023. SDFusion: Multimodal 3D shape completion, reconstruction, and generation. In CVPR. 4456?4465.
[6]
Thiago L. T. da Silveira, Paulo G. L. Pinto, Jeffri Murrugarra-Llerena, and Cl?udio R. Jung. 2022. 3D scene geometry estimation from 360? imagery: A survey. ACM Comput. Surv. 55, 4, Article 68 (2022), 39 pages.
[7]
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, 2023a. Objaverse-XL: A universe of 10M+ 3D objects. In NeurIPS.
[8]
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2023b. Objaverse: A universe of annotated 3D objects. In CVPR. 13142?13153.
[9]
Congyue Deng, Chiyu Jiang, Charles R Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, Dragomir Anguelov, 2023. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In CVPR. 20637?20647.
[10]
Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. 2022. Google scanned objects: A high-quality dataset of 3D scanned household items. In ICRA. IEEE, 2553?2560.
[11]
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. 2022. Get3D: A generative model of high quality 3D textured shapes learned from images. In NeurIPS. 31841?31854.
[12]
Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Barlas O?uz. 2023. 3DGen: Triplane latent diffusion for textured mesh generation. arXiv:2303.05371.
[13]
Zexin He and Tengfei Wang. 2023. OpenLRM: Open-source large reconstruction models. https://github.com/3DTopia/OpenLRM.
[14]
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2024. LRM: Large reconstruction model for single image to 3D. In ICLR.
[15]
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, and James M Rehg. 2024. ZeroShape: Regression-based zero-shot shape reconstruction. In CVPR.
[16]
Moritz Ibing, Gregor Kobsik, and Leif Kobbelt. 2023. Octree Transformer: Autoregressive 3D shape generation on hierarchically structured sequences. In CVPR. 2697?2706.
[17]
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. 2022. Zero-shot text-guided object generation with dream fields. In CVPR. 867?876.
[18]
Heewoo Jun and Alex Nichol. 2023. Shap-E: Generating conditional 3D implicit functions. arXiv:2305.02463.
[19]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk?hler, and George Drettakis. 2023. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42, 4, Article 139 (2023), 14 pages.
[20]
Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. 2020. Modular primitives for high-performance differentiable rendering. ACM Trans. Graph. 39, 6 (2020), 1?14.
[21]
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. 2024b. Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model. In ICLR.
[22]
Muheng Li, Yueqi Duan, Jie Zhou, and Jiwen Lu. 2023. Diffusion-SDF: Text-to-shape via voxelized diffusion. In CVPR. 12642?12651.
[23]
Weiyu Li, Rui Chen, Xuelin Chen, and Ping Tan. 2024a. SweetDreamer: Aligning geometric priors in 2D diffusion for consistent Text-to-3D. In ICLR.
[24]
Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, and Hao Su. 2024b. One-2-3-45++: Fast single image to 3D objects with consistent multi-view generation and 3D diffusion. In CVPR.
[25]
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. 2023d. One-2-3-45: Any single image to 3D mesh in 45 seconds without per-shape optimization. In NeurIPS.
[26]
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023b. Zero-1-to-3: Zero-shot one image to 3D object. In ICCV.
[27]
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. 2024a. SyncDreamer: Generating multiview-consistent images from a single-view image. In ICLR.
[28]
Yuxin Liu, Minshan Xie, Hanyuan Liu, and Tien-Tsin Wong. 2023c. Text-guided texturing by synchronized multi-view diffusion. arXiv:2311.12891.
[29]
Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, and Chi-Wing Fu. 2023a. ISS: Image as stepping stone for text-guided 3D shape generation. In ICLR.
[30]
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, and Wenping Wang. 2024. Wonder3D: Single Image to 3D using Cross-Domain Diffusion. In CVPR.
[31]
Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. 2022. SparseNeuS: Fast generalizable neural surface reconstruction from sparse views. In ECCV. 210?227.
[32]
Yuanxun Lu, Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, Xun Cao, and Yao Yao. 2024. Direct2.5: Diverse text-to-3D generation via multi-view 2.5D diffusion. In CVPR.
[33]
Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, and Andrea Vedaldi. 2023. Realfusion: 360? reconstruction of any object from a single image. In CVPR. 8446?8455.
[34]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, and Ravi Ramamoorthi. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In ECCV.
[35]
Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, and Shubham Tulsiani. 2022. AutoSDF: Shape priors for 3D completion, reconstruction and generation. In CVPR.
[36]
Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. 2020. PolyGen: An autoregressive generative model of 3D meshes. In ICML. 7220?7229.
[37]
Maxime Oquab, Timoth?e Darcet, Th?o Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bojanowski. 2024. DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research (2024).
[38]
Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, and Gaoang Wang. 2023. Chasing consistency in text-to-3D generation from a single image. arXiv:2309.03599.
[39]
Onur ozyesil, Vladislav Voroninski, Ronen Basri, and Amit Singer. 2017. A survey of structure from motion. Acta Numerica 26 (2017), 305?364.
[40]
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dreamfusion: Text-to-3D using 2D diffusion. In ICLR.
[41]
Senthil Purushwalkam and Nikhil Naik. 2023. ConRad: Image constrained radiance fields for 3D generation from a single image. In NeurIPS.
[42]
Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, 2024. Magic123: One image to high-quality 3D object generation using both 2D and 3D diffusion priors. In ICLR.
[43]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022a. High-resolution image synthesis with latent diffusion models. In CVPR. 10684?10695.
[44]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022b. High-resolution image synthesis with latent diffusion models. In CVPR.
[45]
Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR. 519?528.
[46]
Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. 2021. Deep marching tetrahedra: A hybrid representation for high-resolution 3D shape synthesis. In NeurIPS. 6087?6101.
[47]
Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, and Jun Gao. 2023. Flexible isosurface extraction for gradient-based mesh optimization. ACM Trans. Graph. 42, 4, Article 37 (2023), 16 pages.
[48]
Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. 2023. Zero123++: a single image to consistent multi-view diffusion base model. arXiv:2310.15110.
[49]
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. 2024. MVDream: Multi-view diffusion for 3D generation. In ICLR.
[50]
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. 2024. DreamCraft3D: Hierarchical 3D generation with bootstrapped diffusion prior. In ICLR.
[51]
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. 2024. Splatter image: Ultra-fast single-view 3D reconstruction. In CVPR.
[52]
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. LGM: Large multi-view gaussian model for high-resolution 3D content creation. arXiv:2402.05054.
[53]
Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. 2023a. Make-It-3D: High-fidelity 3D creation from a single image with diffusion prior. In ICCV.
[54]
Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, and Yasutaka Furukawa. 2023b. MVDiffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion. In NeurIPS.
[55]
Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A Yeh, and Greg Shakhnarovich. 2023a. Score Jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In CVPR.
[56]
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. 2021. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS.
[57]
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. 2023b. ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. In NeurIPS.
[58]
Haohan Weng, Tianyu Yang, Jianan Wang, Yu Li, Tong Zhang, CL Chen, and Lei Zhang. 2023. Consistent123: Improve consistency for one image to 3D object synthesis. arXiv:2310.08092.
[59]
Sangmin Woo, Byeongjun Park, Hyojun Go, Jin-Young Kim, and Changick Kim. 2024. HarmonyView: Harmonizing consistency and diversity in one-image-to-3D. In CVPR.
[60]
Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. 2023b. Multiview compressive coding for 3D reconstruction. In CVPR. 9065?9075.
[61]
Haoyu Wu, Alexandros Graikos, and Dimitris Samaras. 2023a. S-VolSDF: Sparse multi-view stereo regularization of neural implicit surfaces. In ICCV. 3556?3568.
[62]
Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, and Arash Vahdat. 2024b. AGG: Amortized generative 3D Gaussians for single image to 3D. arXiv:2401.04099.
[63]
Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wetzstein. 2024a. GRM: Large Gaussian reconstruction model for efficient 3D reconstruction and generation. arXiv:2403.14621.
[64]
Xiaoqiang Yan, Shizhe Hu, Yiqiao Mao, Yangdong Ye, and Hui Yu. 2021. Deep multi-view learning methods: A review. Neurocomputing 448 (2021), 106?129.
[65]
Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, and Qi Tian. 2024b. GaussianObject: Just taking four images to get a high-quality 3D object with Gaussian splatting. arXiv:2402.10259.
[66]
Jiayu Yang, Ziang Cheng, Yunfei Duan, Pan Ji, and Hongdong Li. 2024a. ConsistNet: Enforcing 3D consistency for multi-view images diffusion. In CVPR.
[67]
Jianglong Ye, Peng Wang, Kejie Li, Yichun Shi, and Heng Wang. 2023. Consistent-1-to-3: Consistent image to 3D view synthesis via geometry-aware diffusion models. In 3DV.
[68]
Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021. pixelNeRF: Neural radiance fields from one or few images. In CVPR.
[69]
Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Long Quan, Ying Shan, and Yonghong Tian. 2023. HiFi-123: Towards high-fidelity one image to 3D content generation. arXiv:2310.06744.
[70]
Bohan Zeng, Shanglin Li, Yutang Feng, Hong Li, Sicheng Gao, Jiaming Liu, Huaxia Li, Xu Tang, Jianzhuang Liu, and Baochang Zhang. 2023. IPDreamer: Appearance-controllable 3D object generation with image prompts. arXiv:2310.05375.
[71]
Biao Zhang, Matthias Nie?ner, and Peter Wonka. 2022. 3DILG: Irregular latent grids for 3D generative modeling. In NeurIPS.
[72]
Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 2023. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models. ACM Trans. Graph. 42, 4, Article 92 (2023), 16 pages.
[73]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR. 586?595.
[74]
Xinyang Zheng, Yang Liu, Pengshuai Wang, and Xin Tong. 2022. SDF-StyleGAN: Implicit SDF-Based StyleGAN for 3D shape generation. Comput. Graph. Forum 41 (2022), 52?63.
[75]
Xinyang Zheng, Hao Pan, Pengshuai Wang, Xin Tong, Yang Liu, and Heungyeung Shum. 2023. Locally attentional SDF diffusion for controllable 3D shape generation. ACM Trans. Graph. 42, 4, Article 91 (2023), 13 pages.
[76]
Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. 2024. Triplane meets Gaussian splatting: Fast and generalizable single-view 3D reconstruction with transformers. In CVPR.