“Object-level Scene Deocclusion”
Conference:
Type(s):
Title:
- Object-level Scene Deocclusion
Presenter(s)/Author(s):
Abstract:
In this paper, we present a new self-supervised framework, named PACO, for object-level scene deocclusion to deocclude each of the objects of a real-world scene. Our approach allows multiple downstream applications, including scene-level, single-image 3D reconstruction and object rearrangement in images and 3D scenes.
References:
[1]
Jasmin Breitenstein and Tim Fingscheidt. 2022. Amodal cityscapes: a new dataset, its generation, and an amodal semantic segmentation challenge baseline. In IV.
[2]
Christopher P Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, and Alexander Lerchner. 2019. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019).
[3]
Helisa Dhamo, Nassir Navab, and Federico Tombari. 2019. Object-driven multi-layer scene decomposition from a single image. In ICCV.
[4]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. NeurIPS (2021).
[5]
Kiana Ehsani, Roozbeh Mottaghi, and Ali Farhadi. 2018. SeGAN: Segmenting and generating the invisible. In CVPR.
[6]
Martin Engelcke, Adam R Kosiorek, Oiwi Parker Jones, and Ingmar Posner. 2020. Genesis: Generative scene inference and sampling with object-centric latent representations. ICLR (2020).
[7]
Patrick Follmann, Rebecca K?nig, Philipp H?rtinger, Michael Klostermann, and Tobias B?ttger. 2019. Learning to see the invisible: End-to-end trainable amodal instance segmentation. In WACV.
[8]
Locatello Francesco, Weissenborn Dirk, Unterthiner Thomas, Mahendran Aravindh, Heigold Georg, Uszkoreit Jakob, Dosovitskiy Alexey, and Kipf Thomas. 2020. Object-centric learning with slot attention. NeurIPS (2020).
[9]
Klaus Greff, Rapha?l Lopez Kaufman, Rishabh Kabra, Nick Watters, Christopher Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, and Alexander Lerchner. 2019. Multi-object representation learning with iterative variational inference. In ICML.
[10]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NIPS (2017).
[11]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. NeurIPS (2020).
[12]
Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. NeurIPS Workshop (2022).
[13]
Yuan-Ting Hu, Hong-Shuo Chen, Kexin Hui, Jia-Bin Huang, and Alexander G Schwing. 2019. Sail-vos: Semantic amodal instance level video object segmentation-a synthetic dataset and baselines. In CVPR.
[14]
Justin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR.
[15]
Abhishek Kar, Shubham Tulsiani, Joao Carreira, and Jitendra Malik. 2015. Amodal completion and size constancy in natural scenes. In ICCV.
[16]
Lei Ke, Yu-Wing Tai, and Chi-Keung Tang. 2021. Deep occlusion-aware instance segmentation with overlapping bilayers. In CVPR.
[17]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[18]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
[19]
Ke Li and Jitendra Malik. 2016. Amodal instance segmentation. In ECCV.
[20]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll?r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV.
[21]
Buyu Liu, Bingbing Zhuang, and Manmohan Chandraker. 2022. Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts. In CVPR.
[22]
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR.
[23]
Kaustubh Mani, Swapnil Daga, Shubhika Garg, Sai Shankar Narasimhan, Madhava Krishna, and Krishna Murthy Jatavallabhula. 2020. Monolayout: Amodal scene layout from a single image. In WACV.
[24]
Rohit Mohan and Abhinav Valada. 2022a. Amodal panoptic segmentation. In CVPR.
[25]
Rohit Mohan and Abhinav Valada. 2022b. Perceiving the invisible: Proposal-free amodal panoptic segmentation. RAL (2022).
[26]
Tom Monnier, Elliot Vincent, Jean Ponce, and Mathieu Aubry. 2021. Unsupervised layered image decomposition into object prototypes. In ICCV.
[27]
Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, and Amanpreet Singh. 2020. Seeing the un-scene: Learning amodal semantic maps for room navigation. In ECCV.
[28]
Khoi Nguyen and Sinisa Todorovic. 2021. A weakly supervised amodal segmenter with boundary uncertainty estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7396?7405.
[29]
OpenAI. 2023. GPT-4V(ision) System Card. (2023).
[30]
Ege Ozguroglu, Ruoshi Liu, D?dac Sur?s, Dian Chen, Achal Dave, Pavel Tokmakov, and Carl Vondrick. 2024. pix2gestalt: Amodal Segmentation by Synthesizing Wholes. (2024).
[31]
Dim P Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2019. How to make a pizza: Learning a compositional layer-based GAN model. In CVPR.
[32]
Pulak Purkait, Christopher Zach, and Ian Reid. 2019. Seeing behind things: Extending semantic segmentation to occluded regions. In IROS.
[33]
Lu Qi, Li Jiang, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Amodal instance segmentation with kins dataset. In CVPR.
[34]
Ren? Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. 2020. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. TPAMI (2020).
[35]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In ICML.
[36]
Kabra Rishabh, Burgess Chris, Matthey Loic, Lopez Kaufman Raphael, Greff Klaus, Reynolds Malcolm, and Lerchner. Alexander. 2019. Multi-object datasets.
[37]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR.
[38]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICLM.
[39]
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. 2023. Dreamcraft3D: Hierarchical 3D generation with bootstrapped diffusion prior. arXiv preprint arXiv:2310.16818 (2023).
[40]
Yihong Sun, Adam Kortylewski, and Alan Yuille. 2022. Amodal segmentation through out-of-task and out-of-distribution generalization with a Bayesian model. In CVPR.
[41]
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2022. Resolution-robust large mask inpainting with fourier convolutions. In WACV.
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS.
[43]
Angtian Wang, Yihong Sun, Adam Kortylewski, and Alan L Yuille. 2020. Robust object detection under occlusion with context-aware compositionalnets. In CVPR.
[44]
Yuting Xiao, Yanyu Xu, Ziming Zhong, Weixin Luo, Jiawei Li, and Shenghua Gao. 2021. Amodal segmentation based on visible region segmentation and shape prior. In AAAI.
[45]
Chaohao Xie, Shaohui Liu, Chao Li, Ming-Ming Cheng, Wangmeng Zuo, Xiao Liu, Shilei Wen, and Errui Ding. 2019. Image inpainting with learnable bidirectional attention maps. In ICCV.
[46]
Xiaosheng Yan, Feigege Wang, Wenxi Liu, Yuanlong Yu, Shengfeng He, and Jia Pan. 2019. Visualizing the invisible: Occluded vehicle segmentation and recovery. In ICCV.
[47]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In ICCV.
[48]
Xiaoding Yuan, Adam Kortylewski, Yihong Sun, and Alan Yuille. 2021. Robust instance segmentation through reasoning about multi-object occlusion. In CVPR.
[49]
Guanqi Zhan, Chuanxia Zheng, Weidi Xie, and Andrew Zisserman. 2023. Amodal Ground Truth and Completion in the Wild. arXiv preprint arXiv:2312.17247 (2023).
[50]
Xiaohang Zhan, Xingang Pan, Bo Dai, Ziwei Liu, Dahua Lin, and Chen Change Loy. 2020. Self-supervised scene de-occlusion. In CVPR.
[51]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836?3847.
[52]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR.
[53]
Ziheng Zhang, Anpei Chen, Ling Xie, Jingyi Yu, and Shenghua Gao. 2019. Learning semantics-aware distance map with semantics layering network for amodal instance segmentation. In ACM MM.
[54]
Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, and Jianfei Cai. 2021. Visiting the invisible: Layer-by-layer completed scene decomposition. IJCV (2021).
[55]
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2017. Scene parsing through ADE20k dataset. In CVPR.
[56]
Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, and Xinggang Wang. 2021. Human de-occlusion: Invisible perception and recovery for humans. In CVPR.
[57]
Yan Zhu, Yuandong Tian, Dimitris Metaxas, and Piotr Doll?r. 2017. Semantic amodal segmentation. In CVPR.