“Materialistic: Selecting Similar Materials in Images” by Sharma, Philip, Gharbi, Freeman, Durand, et al. …

  • ©Prafull Sharma, Julien Philip, Michaël Gharbi, Bill Freeman, Frédo Durand, and Valentin Deschaintre

Conference:


Type(s):


Title:

    Materialistic: Selecting Similar Materials in Images

Session/Category Title:   Material Acquisition


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO [Caron et al. 2021] features coupled with a proposed Cross-Similarity Feature Weighting module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model’s behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.

References:


    1. 2021. Evermotion Arch Interior. https://evermotion.org/shop/cat/397/archinteriors.
    2. Edward H Adelson. 2001. On seeing stuff: the perception of materials by humans and machines. In Human vision and electronic imaging VI, Vol. 4299. SPIE, 1–12.
    3. Miika Aittala, Timo Aila, and Jaakko Lehtinen. 2016. Reflectance Modeling by Neural Texture Synthesis. ACM Trans. Graph. 35, 4 (2016), 65:1–65:13.
    4. Miika Aittala, Tim Weyrich, and Jaakko Lehtinen. 2015. Two-shot SVBRDF Capture for Stationary Materials. ACM Trans. Graph. 34, 4 (2015), 110:1–110:13.
    5. Xiaobo An and Fabio Pellacini. 2008. AppProp: All-Pairs Appearance-Space Edit Propagation. In ACM SIGGRAPH 2008 Papers (Los Angeles, California) (SIGGRAPH ’08). Association for Computing Machinery, New York, NY, USA, Article 40, 9 pages.
    6. Dejan Azinović, Tzu-Mao Li, Anton Kaplanyan, and Matthias Nießner. 2019. Inverse Path Tracing for Joint Material and Lighting Estimation. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.
    7. Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. 2013. OpenSurfaces: A richly annotated catalog of surface appearance. ACM Transactions on graphics (TOG) 32, 4 (2013), 1–17.
    8. Sean Bell, Paul Upchurch, Noah Snavely, and Kavita Bala. 2015. Material Recognition in the Wild with the Materials in Context Database. Computer Vision and Pattern Recognition (CVPR) (2015).
    9. Serge Belongie, Chad Carson, Hayit Greenspan, and Jitendra Malik. 1998. Color-and texture-based image segmentation using EM and its application to content-based image retrieval. In Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271). IEEE, 675–682.
    10. Adrien Bousseau, Sylvain Paris, and Frédo Durand. 2009. User Assisted Intrinsic Images. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2009) 28, 5 (2009).
    11. Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. 2018. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1209–1218.
    12. Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, and Ling Shao. 2020. D2det: Towards high quality object detection and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11485–11494.
    13. Jiezhang Cao, Yawei Li, Kai Zhang, and Luc Van Gool. 2021. Video super-resolution transformer. arXiv preprint arXiv:2106.06847 (2021).
    14. Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9650–9660.
    15. Bingling Chen, Yan Huang, Qiaoqiao Xia, and Qinglin Zhang. 2020. Nonlocal spatial attention module for image classification. International Journal of Advanced Robotic Systems 17, 5 (2020), 1729881420938927.
    16. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2017), 834–848.
    17. Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35, 9 (Sept 2013), 2175–2188.
    18. Xiangyu Chen, Xintao Wang, Jiantao Zhou, and Chao Dong. 2022. Activating More Pixels in Image Super-Resolution Transformer. arXiv preprint arXiv:2205.04437 (2022).
    19. Mircea Cimpoi, Subhransu Maji, and Andrea Vedaldi. 2014. Deep convolutional filter banks for texture recognition and segmentation. arXiv preprint arXiv:1411.6836 (2014).
    20. Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, and Jitendra Malik. 2022. ABO: Dataset and Benchmarks for Real-World 3D Object Understanding. CVPR (2022).
    21. Blender Online Community. 2018. Blender – a 3D modelling and rendering package. http://www.blender.org
    22. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.
    23. Yining Deng and Bangalore S Manjunath. 2001. Unsupervised segmentation of color-texture regions in images and video. IEEE transactions on pattern analysis and machine intelligence 23, 8 (2001), 800–810.
    24. Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, and Adrien Bousseau. 2018. Single-image SVBRDF Capture with a Rendering-aware Deep Network. ACM Trans. Graph. 37, 4 (2018), 128:1–128:15.
    25. Valentin Deschaintre, Miika Aittala, Frédo Durand, George Drettakis, and Adrien Bousseau. 2019. Flexible SVBRDF Capture with a Multi-Image Deep Network. Computer Graphics Forum 38, 4 (2019).
    26. Valentin Deschaintre, George Drettakis, and Adrien Bousseau. 2020. Guided Fine-Tuning for Large-Scale Material Transfer. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 39, 4 (2020). http://www-sop.inria.fr/reves/Basilic/2020/DDB20
    27. Valentin Deschaintre, Yiming Lin, and Abhijeet Ghosh. 2021. Deep polarization imaging for 3D shape and SVBRDF acquisition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
    28. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
    29. M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision 111, 1 (Jan. 2015), 98–136.
    30. Itzhak Fogel and Dov Sagi. 1989. Gabor filters as texture discriminator. Biological cybernetics 61, 2 (1989), 103–113.
    31. Duan Gao, Xiao Li, Yue Dong, Pieter Peers, Kun Xu, and Xin Tong. 2019. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Trans. Graph. 38, 4 (2019).
    32. Elena Garces, Carlos Rodriguez-Pardo, Dan Casas, and Jorge Lopez-Moreno. 2022. A Survey on Intrinsic Images: Delving Deep into Lambert and Beyond. International Journal of Computer Vision 130, 3 (2022), 836–868.
    33. Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 3354–3361.
    34. Thomas Germer, Tobias Uelwer, Stefan Conrad, and Stefan Harmeling. 2020. PyMatting: A Python Library for Alpha Matting. Journal of Open Source Software 5, 54 (2020), 2481.
    35. Golnaz Ghiasi, Xiuye Gu, Yin Cui, and Tsung-Yi Lin. 2021. Open-vocabulary image segmentation. arXiv preprint arXiv:2112.12143 (2021).
    36. David Griffiths, Tobias Ritschel, and Julien Philip. 2022. OutCast: Single Image Relighting with Cast Shadows. Computer Graphics Forum 43 (2022).
    37. Dar’ya Guarnera, Giuseppe Claudio Guarnera, Abhijeet Ghosh, Cornelia Denk, and Mashhuda Glencross. 2016. BRDF Representation and Acquisition. Computer Graphics Forum (2016).
    38. Jie Guo, Shuichang Lai, Chengzhi Tao, Yuelong Cai, Lei Wang, Yanwen Guo, and Ling-Qi Yan. 2021. Highlight-Aware Two-Stream Network for Single-Image SVBRDF Acquisition. ACM Trans. Graph. 40, 4, Article 123 (jul 2021), 14 pages.
    39. Yu Guo, Cameron Smith, Miloš Hašan, Kalyan Sunkavalli, and Shuang Zhao. 2020. MaterialGAN: Reflectance Capture using a Generative SVBRDF Model. ACM Trans. Graph. 39, 6 (2020), 254:1–254:13.
    40. Michal Haindl and Stanislav Mikes. 2008. Texture segmentation benchmark. In 2008 19th International Conference on Pattern Recognition. IEEE, 1–4.
    41. Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, and William T Freeman. 2022. Unsupervised Semantic Segmentation by Distilling Feature Correspondences. arXiv preprint arXiv:2203.08414 (2022).
    42. Robert M Haralick, Karthikeyan Shanmugam, and Its’ Hak Dinstein. 1973. Textural features for image classification. IEEE Transactions on systems, man, and cybernetics 6 (1973), 610–621.
    43. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.
    44. Philipp Henzler, Valentin Deschaintre, Niloy J. Mitra, and Tobias Ritschel. 2021. Generative Modelling of BRDF Textures from Flash Images. ACM Trans. Graph. 40, 6, Article 284 (dec 2021), 13 pages.
    45. Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132–7141.
    46. Yiwei Hu, Miloš Hašan, Paul Guerrero, Holly Rushmeier, and Valentin Deschaintre. 2022a. Controlling Material Appearance by Examples. Computer Graphics Forum (2022).
    47. Yiwei Hu, Chengan He, Valentin Deschaintre, Julie Dorsey, and Holly Rushmeier. 2022b. An Inverse Procedural Modeling Pipeline for SVBRDF Maps. ACM Trans. Graph. 41, 2, Article 18 (jan 2022), 17 pages.
    48. Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Jonathan Eisenmann, Siavash Khodadadeh, and Jean-François Lalonde. 2022. Guided Co-Modulated GAN for 360 degree Field of View Extrapolation. International Conference on 3D Vision (3DV) (2022).
    49. Erum Arif Khan, Erik Reinhard, Roland W. Fleming, and Heinrich H. Bülthoff. 2006. Image-Based Material Editing. ACM Trans. Graph. 25, 3 (jul 2006), 654–663.
    50. Jason Lawrence, Aner Ben-Artzi, Christopher DeCoro, Wojciech Matusik, Hanspeter Pfister, Ravi Ramamoorthi, and Szymon Rusinkiewicz. 2006. Inverse Shade Trees for Non-Parametric Material Representation and Editing. ACM Transactions on Graphics (Proc. SIGGRAPH) 25, 3 (July 2006).
    51. Daniel Lepage and Jason Lawrence. 2011. Material Matting. ACM Trans. Graph. 30, 6 (Dec. 2011), 1–10.
    52. Thomas Leung and Jitendra Malik. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. International journal of computer vision 43, 1 (2001), 29–44.
    53. Zhengqin Li, Mohammad Shafiei, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2020. Inverse rendering for complex indoor scenes: Shape, spatially-varying lighting and svbrdf from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2475–2484.
    54. Zhengqin Li, Kalyan Sunkavalli, and Manmohan Chandraker. 2018a. Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image. In Computer Vision – ECCV 2018 – 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part III (Lecture Notes in Computer Science, Vol. 11207). 74–90.
    55. Zhengqin Li, Zexiang Xu, Ravi Ramamoorthi, Kalyan Sunkavalli, and Manmohan Chandraker. 2018b. Learning to reconstruct shape and spatially-varying reflectance from a single image. In SIGGRAPH Asia 2018 Technical Papers. ACM, 269.
    56. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
    57. Ce Liu, Lavanya Sharan, Edward H Adelson, and Ruth Rosenholtz. 2010. Exploring features in a bayesian framework for material recognition. In 2010 ieee computer society conference on computer vision and pattern recognition. IEEE, 239–246.
    58. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440.
    59. Zhisheng Lu, Hong Liu, Juncheng Li, and Linlin Zhang. 2021. Efficient transformer for single image super-resolution. arXiv preprint arXiv:2108.11084 (2021).
    60. Norberto Malpica, Juan E Ortuño, and Andres Santos. 2003. A multichannel watershed-based algorithm for supervised texture segmentation. Pattern Recognition Letters 24, 9–10 (2003), 1545–1554.
    61. Lukas Murmann, Michael Gharbi, Miika Aittala, and Fredo Durand. 2019. A dataset of multi-illumination images in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4080–4089.
    62. Baptiste Nicolet, Julien Philip, and George Drettakis. 2020. Repurposing a Relighting Network for Realistic Compositions of Captured Scenes. In Symposium on Interactive 3D Graphics and Games (San Francisco, CA, USA) (I3D ’20). Association for Computing Machinery, New York, NY, USA, Article 4, 9 pages.
    63. Merlin Nimier-David, Zhao Dong, Wenzel Jakob, and Anton Kaplanyan. 2021. Material and Lighting Reconstruction for Complex Indoor Scenes with Texture-space Differentiable Rendering. In Eurographics Symposium on Rendering – DL-only Track, Adrien Bousseau and Morgan McGuire (Eds.). The Eurographics Association.
    64. William Peebles and Saining Xie. 2022. Scalable Diffusion Models with Transformers. arXiv preprint arXiv:2212.09748 (2022).
    65. Fabio Pellacini and Jason Lawrence. 2007. AppWand: Editing Measured Materials Using Appearance-Driven Optimization. ACM Trans. Graph. 26, 3 (jul 2007), 54–es.
    66. Julien Philip, Michaël Gharbi, Tinghui Zhou, Alexei A. Efros, and George Drettakis. 2019. Multi-View Relighting Using a Geometry-Aware Network. ACM Trans. Graph. 38, 4, Article 78 (jul 2019), 14 pages.
    67. Julien Philip, Sébastien Morgenthaler, Michaël Gharbi, and George Drettakis. 2021. Free-viewpoint Indoor Neural Relighting from Multi-view Stereo. ACM Transactions on Graphics (2021). http://www-sop.inria.fr/reves/Basilic/2021/PMGD21
    68. Trygve Randen and John Hakon Husoy. 1999. Filtering for texture classification: A comparative study. IEEE Transactions on pattern analysis and machine intelligence 21, 4 (1999), 291–310.
    69. René Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021. Vision Transformers for Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 12179–12188.
    70. René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. 2022. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022).
    71. Constantino Carlos Reyes-Aldasoro and Abhir Bhalerao. 2006. The Bhattacharyya space for feature selection and its application to texture segmentation. Pattern Recognition 39, 5 (2006), 812–826.
    72. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV]
    73. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234–241.
    74. Gabriel Schwartz and Ko Nishino. 2013. Visual Material Traits: Recognizing Per-Pixel Material Context. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops.
    75. Gabriel Schwartz and Ko Nishino. 2016. Material Recognition from Local Appearance in Global Context. ArXiv abs/1611.09394 (2016).
    76. Gabriel Schwartz and Ko Nishino. 2019. Recognizing material properties from images. IEEE transactions on pattern analysis and machine intelligence 42, 8 (2019), 1981–1995.
    77. Gabriel Schwartz and Ko Nishino. 2020. Recognizing Material Properties from Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 8 (2020), 1981–1995.
    78. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105–6114.
    79. Maxim Tkachenko, Mikhail Malyuk, Nikita Shevchenko, Andrey Holmanyuk, and Nikolai Liubimov. 2020–2021. Label Studio: Data labeling software. https://github.com/heartexlabs/label-studio Open source software available from https://github.com/heartexlabs/label-studio.
    80. Sinisa Todorovic and Narendra Ahuja. 2009. Texel-based texture segmentation. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 841–848.
    81. Paul Upchurch and Ransen Niu. 2022. A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing. In European Conference on Computer Vision. Springer, 450–466.
    82. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
    83. Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, and Liang-Chieh Chen. 2021. Maxdeeplab: End-to-end panoptic segmentation with mask transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5463–5474.
    84. Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, and Liang-Chieh Chen. 2020b. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In European Conference on Computer Vision. Springer, 108–126.
    85. Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, and Chunhua Shen. 2020a. Solov2: Dynamic and fast instance segmentation. Advances in Neural information processing systems 33 (2020), 17721–17732.
    86. Jia Xue, Hang Zhang, Ko Nishino, and Kristin J. Dana. 2022. Differential Viewpoints for Ground Terrain Material Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022), 1205–1218.
    87. Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, and Baining Guo. 2020. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5791–5800.
    88. Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, and Jingdong Wang. 2018. Ocnet: Object context network for scene parsing. arXiv preprint arXiv:1809.00916 (2018).
    89. Bowen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, and Baining Guo. 2022. Styleswin: Transformer-based gan for high-resolution image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11304–11314.
    90. Hongyi Zhang, Yann N Dauphin, and Tengyu Ma. 2019. Fixup initialization: Residual learning without normalization. arXiv preprint arXiv:1901.09321 (2019).
    91. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition. 633–641.
    92. Xilong Zhou, Milos Hasan, Valentin Deschaintre, Paul Guerrero, Kalyan Sunkavalli, and Nima Khademi Kalantari. 2022. TileGen: Tileable, Controllable Material Generation and Capture. In SIGGRAPH Asia 2022 Conference Papers. 1–9.


ACM Digital Library Publication:



Overview Page: