“IntrinsicDiffusion: Joint Intrinsic Layers From Latent Diffusion Models”
Conference:
Type(s):
Title:
- IntrinsicDiffusion: Joint Intrinsic Layers From Latent Diffusion Models
Presenter(s)/Author(s):
Abstract:
Estimating intrinsic images, such as albedo, shading, and normals, is challenging. We propose leveraging the implicit priors learned by the large-scale generation model. Our novel conditioning mechanism allows predicting multiple intrinsic modalities jointly and training with mixed datasets that only have partial annotations. It achieves state-of-the-art performance qualitatively and quantitatively.
References:
[1]
H. G. Barrow and J. M. Tenenbaum. 1978. Recovering intrinsic scene characteristics from images. Computer Vision Systems (1978).
[2]
Anil S Baslamisli, Thomas T Groenestege, Partha Das, Hoang-An Le, Sezer Karaoglu, and Theo Gevers. 2018. Joint learning of intrinsic images and semantic segmentation. In ECCV. 286?302.
[3]
Ronen Basri and David W Jacobs. 2003. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 2 (2003), 218?233.
[4]
Sean Bell, Kavita Bala, and Noah Snavely. 2014. Intrinsic images in the wild. ACM Transactions on Graphics 33, 4 (2014), 159:1?12.
[5]
A.H. Bermano, R. Gal, Y. Alaluf, R. Mokady, Y. Nitzan, O. Tov, O. Patashnik, and D. Cohen-Or. 2022. State-of-the-Art in the Architecture, Methods and Applications of StyleGAN. Computer Graphics Forum 41, 2 (2022), 591?611. https://doi.org/10.1111/cgf.14503
[6]
Anand Bhattad, Daniel McKee, Derek Hoiem, and D.A. Forsyth. 2023. StyleGAN knows Normal, Depth, Albedo, and More. In NeurIPS.
[7]
Sai Bi, Xiaoguang Han, and Yizhou Yu. 2015. An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition. ACM Transactions on Graphics 34, 4 (2015), 78:1?12.
[8]
Nicolas Bonneel, Balazs Kovacs, Sylvain Paris, and Kavita Bala. 2017. Intrinsic decompositions for image editing. Computer Graphics Forum 36, 2 (2017), 593?609.
[9]
Adrien Bousseau, Sylvain Paris, and Fr?do Durand. 2009. User-assisted intrinsic images. ACM Transactions on Graphics 28, 5 (2009), 130:1?10. https://doi.org/10.1145/1618452.1618476
[10]
Chris Careaga and Ya??z Aksoy. 2023. Intrinsic Image Decomposition via Ordinal Shading. ACM Trans. Graph. 43, 1 (2023), 12:1?24. https://doi.org/10.1145/3630750
[11]
Qifeng Chen and Vladlen Koltun. 2013. A simple model for intrinsic image decomposition with depth cues. In ICCV. 241?248.
[12]
Changwoon Choi, Juhyeon Kim, and Young Min Kim. 2023. IBL-NeRF: Image-Based Lighting Formulation of Neural Radiance Fields. Comput. Graph. Forum (2023). https://doi.org/10.1111/cgf.14929
[13]
Partha Das, Maxime Gevers, Sezer Karaoglu, and Theo Gevers. 2023. IDTransformer: Transformer for Intrinsic Image Decomposition. In ICCV Workshops. 816?825.
[14]
Partha Das, Sezer Karaoglu, and Theo Gevers. 2022. PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition. In CVPR.
[15]
Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, and Anand Bhattad. 2023. Generative Models: What do they know? Do they know things? Let?s find out!. In NeurIPS.
[16]
Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming Transformers for High-Resolution Image Synthesis. In CVPR. 12873?12883.
[17]
Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, and David Wipf. 2018. Revisiting deep intrinsic image decompositions. In CVPR. 8944?8952.
[18]
David Forsyth and Jason Rock. 2022. Intrinsic Image Decomposition using Paradigms. TPAMI 44, 11 (2022), 7624?7637. https://doi.org/10.1109/TPAMI.2021.3119551
[19]
Elena Garces, Adolfo Mu?oz, Jorge Lopez-Moreno, and Diego Gutierrez. 2012. Intrinsic Images by Clustering. Computer Graphics Forum 31, 4 (2012), 1415?1424.
[20]
Elena Garces, Carlos Rodriguez-Pardo, Dan Casas, and Jorge Lopez-Moreno. 2022. A Survey on Intrinsic Images: Delving Deep Into Lambert and Beyond. International Journal of Computer Vision 130 (2022), 836?868. https://doi.org/10.1007/s11263-021-01563-8
[21]
Mathieu Garon, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, and Jean-Fran?ois Lalonde. 2019. Fast Spatially-Varying Indoor Lighting Estimation. In CVPR. 6908?6917.
[22]
Peter Vincent Gehler, Carsten Rother, Martin Kiefel, Lumin Zhang, and Bernhard Sch?lkopf. 2011. Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance. In NIPS.
[23]
Roger Grosse, Micah K Johnson, Edward H Adelson, and William T Freeman. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In ICCV. 2335?2342.
[24]
Mohammed Hachama, Bernard Ghanem, and Peter Wonka. 2015. Intrinsic scene decomposition from RGB-D images. In ICCV. 810?818.
[25]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. https://doi.org/10.1109/CVPR.2016.90
[26]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS. 6840?6851.
[27]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR.
[28]
Junqing Huang, Michael Ruzhansky, Qianying Zhang, and Haihui Wang. 2023. Intrinsic Image Transfer for Illumination Manipulation. TPAMI 45, 6 (2023), 7444?7456. https://doi.org/10.1109/TPAMI.2022.3224253
[29]
Yasamin Jafarian, Tuanfeng Y Wang, Duygu Ceylan, Jimei Yang, Nathan Carr, Yi Zhou, and Hyun Soo Park. 2023. Normal-guided Garment UV Prediction for Human Re-texturing. In CVPR.
[30]
Yeying Jin, Ruoteng Li, Wenhan Yang, and Robby T Tan. 2023. Estimating Reflectance Layer from A Single Image: Integrating Reflectance Guidance and Shadow/Specular Aware Learning. In AAAI.
[31]
Tero Karras, Samuli Laine, and Timo Aila. 2021. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 12 (dec 2021), 4217?4228. https://doi.org/10.1109/TPAMI.2020.2970919
[32]
Seungryong Kim, Kihong Park, Kwanghoon Sohn, and Stephen Lin. 2016. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In ECCV. 143?159.
[33]
Peter Kocsis, Vincent Sitzmann, and Matthias Nie?ner. 2024. Intrinsic Image Diffusion for Single-view Material Estimation. In CVPR.
[34]
Balazs Kovacs, Sean Bell, Noah Snavely, and Kavita Bala. 2017. Shading annotations in the wild. In CVPR. 6998?7007.
[35]
Philipp Kr?henb?hl. 2018. Free supervision from video games. In CVPR.
[36]
Louis Lettry, Kenneth Vanhoey, and Luc Van Gool. 2018. DARN: a deep adversarial residual network for intrinsic image decomposition. In WACV. 1359?1367.
[37]
Daiqing Li, Junlin Yang, Karsten Kreis, Antonio Torralba, and Sanja Fidler. 2021a. Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization. In CVPR.
[38]
Zhengqin Li, Mohammad Shafiei, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2020. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and SVBRDF from a single image. In CVPR. 2475?2484.
[39]
Zhengqi Li and Noah Snavely. 2018a. CGIntrinsics: Better Intrinsic Image Decomposition Through Physically-Based Rendering. In ECCV.
[40]
Zhengqi Li and Noah Snavely. 2018b. Learning intrinsic image decomposition from watching the world. In CVPR. 9039?9048.
[41]
Zhengqin Li, Ting-Wei Yu, Shen Sang, Sarah Wang, Meng Song, Yuhan Liu, Yu-Ying Yeh, Rui Zhu, Nitesh Gundavarapu, Jia Shi, Sai Bi, Hong-Xing Yu, Zexiang Xu, Kalyan Sunkavalli, Milos Hasan, Ravi Ramamoorthi, and Manmohan Chandraker. 2021b. OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets. In CVPR.
[42]
Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. 2024. Common Diffusion Noise Schedules and Sample Steps are Flawed. In WACV.
[43]
Yunfei Liu, Yu Li, Shaodi You, and Feng Lu. 2020. Unsupervised learning for intrinsic image decomposition from a single image. In CVPR.
[44]
Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. 2022. Swin Transformer V2: Scaling Up Capacity and Resolution. In CVPR. 12009?12019.
[45]
Jundan Luo, Zhaoyang Huang, Yijin Li, Xiaowei Zhou, Guofeng Zhang, and Hujun Bao. 2020. NIID-Net: Adapting Surface Normal Knowledge for Intrinsic Image Decomposition in Indoor Scenes. IEEE Transactions on Visualization and Computer Graphics 26, 12 (2020), 3434?3445.
[46]
Jundan Luo, Nanxuan Zhao, Wenbin Li, and Christian Richardt. 2023. CRefNet: Learning Consistent Reflectance Estimation With a Decoder-Sharing Transformer. IEEE Transactions on Visualization and Computer Graphics (2023). https://doi.org/10.1109/TVCG.2023.3337870
[47]
Abhimitra Meka, Gereon Fox, Michael Zollh?fer, Christian Richardt, and Christian Theobalt. 2017. Live user-guided intrinsic video for static scenes. IEEE Transactions on Visualization and Computer Graphics 23, 11 (2017), 2447?2454.
[48]
Abhimitra Meka, Mohammad Shafiei, Michael Zollh?fer, Christian Richardt, and Christian Theobalt. 2021. Real-time Global Illumination Decomposition of Videos. ACM Transactions on Graphics 40, 3 (2021), 1?16.
[49]
Lukas Murmann, Michael Gharbi, Miika Aittala, and Fredo Durand. 2019. A Dataset of Multi-Illumination Images in the Wild. In ICCV. 4080?4089.
[50]
Takuya Narihira, Michael Maire, and Stella X Yu. 2015. Learning lightness from human judgement on relative reflectance. In CVPR. 2965?2973.
[51]
Ryan Po and Gordon Wetzstein. 2023. Compositional 3D Scene Generation using Locally Conditioned Diffusion. (2023). arXiv:2303.12218.
[52]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In ICML.
[53]
Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M Susskind. 2021. Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In ICCV. 10912?10922.
[54]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR.
[55]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI.
[56]
Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. In ICLR.
[57]
Kripasindhu Sarkar, Marcel C. Buehler, Gengyan Li, Daoye Wang, Delio Vicini, J?r?my Riviere, Yinda Zhang, Sergio Orts-Escolano, Paulo Gotardo, Thabo Beeler, and Abhimitra Meka. 2023. LitNeRF: Intrinsic Radiance Decomposition for High-Quality View Synthesis and Relighting of Faces. In SIGGRAPH Asia. https://doi.org/10.1145/3610548.3618210
[58]
Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, and David J. Fleet. 2023a. The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation. In NeurIPS.
[59]
Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, and David J. Fleet. 2023b. Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model. (2023). arXiv:2312.13252.
[60]
Viraj Shah, Svetlana Lazebnik, and Julien Philip. 2023. JoIN: Joint GANs Inversion for Intrinsic Image Decomposition. (2023). arXiv:2305.11321.
[61]
Jianbing Shen, Xiaoshan Yang, Yunde Jia, and Xuelong Li. 2011. Intrinsic images using optimization. In CVPR. 3481?3487.
[62]
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. 746?760.
[63]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML.
[64]
Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, and Gregory Shakhnarovich. 2019. DIODE: A Dense Indoor and Outdoor DEpth Dataset. (2019). arXiv:1908.00463.
[65]
Zongji Wang, Yunfei Liu, and Feng Lu. 2023. Discriminative feature encoding for intrinsic image decomposition. Computational Visual Media 9 (2023), 597?618. https://doi.org/10.1007/s41095-022-0294-4
[66]
Chenglei Wu, Michael Zollh?fer, Matthias Nie?ner, Marc Stamminger, Shahram Izadi, and Christian Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Transactions on Graphics 33, 6 (2014), 200:1?10.
[67]
Jiaye Wu, Sanjoy Chowdhury, Hariharmano Shanmugaraja, David Jacobs, and Soumyadip Sengupta. 2023. Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation. In ICCP.
[68]
Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2023. Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Comput. Surv. 56, 4, Article 105 (nov 2023), 39 pages. https://doi.org/10.1145/3626235
[69]
Weicai Ye, Shuo Chen, Chong Bao, Hujun Bao, Marc Pollefeys, Zhaopeng Cui, and Guofeng Zhang. 2023. IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis. In ICCV.
[70]
Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, and Stephen Lin. 2013. Shading-based shape refinement of RGB-D images. In CVPR. 1415?1422.
[71]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV.
[72]
Qi Zhao, Ping Tan, Qiang Dai, Li Shen, Enhua Wu, and Stephen Lin. 2012. A closed-form solution to Retinex with nonlocal texture constraints. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1437?1444.
[73]
Chengwei Zheng, Wenbin Lin, and Feng Xu. 2022. A Self-Occlusion Aware Lighting Model for Real-Time Dynamic Reconstruction. IEEE Transactions on Visualization and Computer Graphics (2022).
[74]
Hao Zhou, Xiang Yu, and David W Jacobs. 2019. GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition. In ICCV. 7820?7829.
[75]
Tinghui Zhou, Philipp Kr?henb?hl, and Alexei A Efros. 2015. Learning data-driven reflectance priors for intrinsic image decomposition. In ICCV. 3469?3477.
[76]
Jingsen Zhu, Yuchi Huo, Qi Ye, Fujun Luan, Jifan Li, Dianbing Xi, Lisha Wang, Rui Tang, Wei Hua, Hujun Bao, and Rui Wang. 2023. I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs. In CVPR. https://doi.org/10.1109/CVPR52729.2023.01202
[77]
Jingsen Zhu, Fujun Luan, Yuchi Huo, Zihao Lin, Zhihua Zhong, Dianbing Xi, Rui Wang, Hujun Bao, Jiaxiang Zheng, and Rui Tang. 2022. Learning-based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing. In Proceedings of SIGGRAPH Asia. 6:1?8.
[78]
Michael Zollh?fer, Angela Dai, Matthias Innmann, Chenglei Wu, Marc Stamminger, Christian Theobalt, and Matthias Nie?ner. 2015. Shading-based refinement on volumetric signed distance functions. ACM Transactions on Graphics 34, 4 (2015), 96:1?14.
[79]
Daniel Zoran, Phillip Isola, Dilip Krishnan, and William T Freeman. 2015. Learning ordinal relationships for mid-level vision. In ICCV.