Globally and locally consistent image completion

We present a novel approach for image completion that results in images that are both locally and globally consistent. With a fully-convolutional neural network, we can complete images of arbitrary resolutions by filling-in missing regions of any shape. To train this image completion network to be consistent, we use global and local context discriminators that are trained to distinguish real images from completed ones. The global discriminator looks at the entire image to assess if it is coherent as a whole, while the local discriminator looks only at a small area centered at the completed region to ensure the local consistency of the generated patches. The image completion network is then trained to fool the both context discriminator networks, which requires it to generate images that are indistinguishable from real ones with regard to overall consistency as well as in details. We show that our approach can be used to complete a wide variety of scenes. Furthermore, in contrast with the patch-based approaches such as PatchMatch, our approach can generate fragments that do not appear elsewhere in the image, which allows us to naturally complete the images of objects with familiar and highly specific structures, such as faces.

References:

1. Coloma Ballester, Marcelo Bertalmío, Vicent Caselles, Guillermo Sapiro, and Joan Verdera. 2001. Filling-in by joint interpolation of vector fields and gray levels. IEEE Transactions on Image Processing 10, 8 (2001), 1200–1211. Google ScholarDigital Library
2. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. Patch-Match: A Randomized Correspondence Algorithm for Structural Image Editing. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 28, 3 (2009), 24:1–24:11.Google Scholar
3. Connelly Barnes, Eli Shechtman, Dan B. Goldman, and Adam Finkelstein. 2010. The Generalized Patchmatch Correspondence Algorithm. In European Conference on Computer Vision. 29–43. Google ScholarCross Ref
4. Marcelo Bertalmio, Guillermo Sapiro, Vincent Caselles, and Coloma Ballester. 2000. Image Inpainting. In ACM Transactions on Graphics (Proceedings of SIGGRAPH). 417–424. Google ScholarDigital Library
5. M. Bertalmio, L. Vese, G. Sapiro, and S. Osher. 2003. Simultaneous structure and texture image inpainting. IEEE Transactions on Image Processing 12, 8 (2003), 882–889. Google ScholarDigital Library
6. A. Criminisi, P. Perez, and K. Toyama. 2004. Region Filling and Object Removal by Exemplar-based Image Inpainting. IEEE Transactions on Image Processing 13, 9 (2004), 1200–1212. Google ScholarDigital Library
7. Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, and Pradeep Sen. 2012. Image Melding: Combining Inconsistent Images using Patch-based Synthesis. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 31, 4, Article 82 (2012), 82:1–82:10 pages.Google Scholar
8. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.Google Scholar
9. Yue Deng, Qionghai Dai, and Zengke Zhang. 2011. Graph Laplace for occluded face completion and recognition. IEEE Transactions on Image Processing 20, 8 (2011), 2329–2338. Google ScholarDigital Library
10. Iddo Drori, Daniel Cohen-Or, and Hezy Yeshurun. 2003. Fragment-based Image Completion. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 22, 3 (2003), 303–312. Google ScholarDigital Library
11. Alexei Efros and Thomas Leung. 1999. Texture Synthesis by Non-parametric Sampling. In International Conference on Computer Vision. 1033–1038. Google ScholarCross Ref
12. Alexei A. Efros and William T. Freeman. 2001. Image Quilting for Texture Synthesis and Transfer. In ACM Transactions on Graphics (Proceedings of SIGGRAPH). 341–346. Google ScholarDigital Library
13. Kunihiko Fukushima. 1988. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural networks 1, 2 (1988), 119–130. Google ScholarCross Ref
14. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Conference on Neural Information Processing Systems. 2672–2680.Google Scholar
15. James Hays and Alexei A. Efros. 2007. Scene Completion Using Millions of Photographs. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 26, 3, Article 4 (2007). Google ScholarDigital Library
16. Kaiming He and Jian Sun. 2012. Statistics of Patch Offsets for Image Completion. In European Conference on Computer Vision. 16–29. Google ScholarDigital Library
17. Jia-Bin Huang, Sing Bing Kang, Narendra Ahuja, and Johannes Kopf. 2014. Image Completion Using Planar Structure Guidance. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 33, 4, Article 129 (2014), 10 pages.Google Scholar
18. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning.Google ScholarDigital Library
19. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. (2017).Google Scholar
20. Jiaya Jia and Chi-Keung Tang. 2003. Image repairing: robust image synthesis by adaptive ND tensor voting. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 643–650. Google ScholarCross Ref
21. Rolf Köhler, Christian Schuler, Bernhard Schölkopf, and Stefan Harmeling. 2014. Mask-specific inpainting with deep neural networks. In German Conference on Pattern Recognition. Google ScholarCross Ref
22. Johannes Kopf, Wolf Kienzle, Steven Drucker, and Sing Bing Kang. 2012. Quality Prediction for Image Completion. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia) 31, 6, Article 131 (2012), 8 pages.Google Scholar
23. Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra. 2005. Texture Optimization for Example-based Synthesis. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 24, 3 (July 2005), 795–802. Google ScholarDigital Library
24. Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. 2003. Graphcut Textures: Image and Video Synthesis Using Graph Cuts. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 22, 3 (July 2003), 277–286. Google ScholarDigital Library
25. Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation 1, 4 (1989), 541–551. Google ScholarDigital Library
26. Anat Levin, Assaf Zomet, and Yair Weiss. 2003. Learning How to Inpaint from Global Image Statistics. In International Conference on Computer Vision. 305–312. Google ScholarCross Ref
27. Rongjian Li, Wenlu Zhang, Heung-Il Suk, Li Wang, Jiang Li, Dinggang Shen, and Shuiwang Ji. 2014. Deep learning based imaging data completion for improved brain disease diagnosis. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 305–312. Google ScholarCross Ref
28. Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In International Conference on Computer Vision. Google ScholarDigital Library
29. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarCross Ref
30. Umar Mohammed, Simon JD Prince, and Jan Kautz. 2009. Visio-lization: generating novel facial images. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 28, 3 (2009), 57.Google ScholarDigital Library
31. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning. 807–814.Google ScholarDigital Library
32. Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei Efros. 2016. Context Encoders: Feature Learning by Inpainting. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarCross Ref
33. Darko Pavić, Volker Schönefeld, and Leif Kobbelt. 2006. Interactive image completion with perspective correction. The Visual Computer 22, 9 (2006), 671–681. Google ScholarDigital Library
34. Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson Image Editing. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 22, 3 (July 2003), 313–318. Google ScholarDigital Library
35. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In International Conference on Learning Representations.Google Scholar
36. Radim Šára Radim Tyleček. 2013. Spatial Pattern Templates for Recognition of Objects with Regular Structure. In German Conference on Pattern Recognition. Saarbrucken, Germany.Google Scholar
37. Jimmy SJ Ren, Li Xu, Qiong Yan, and Wenxiu Sun. 2015. Shepard Convolutional Neural Networks. In Conference on Neural Information Processing Systems.Google Scholar
38. D.E. Rumelhart, G.E. Hinton, and R.J. Williams. 1986. Learning representations by back-propagating errors. In Nature. Google ScholarCross Ref
39. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Conference on Neural Information Processing Systems.Google Scholar
40. Denis Simakov, Yaron Caspi, Eli Shechtman, and Michal Irani. 2008. Summarizing visual data using bidirectional similarity. In IEEE Conference on Computer Vision and Pattern Recognition. 1–8. Google ScholarCross Ref
41. Jian Sun, Lu Yuan, Jiaya Jia, and Heung-Yeung Shum. 2005. Image Completion with Structure Propagation. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 24, 3 (July 2005), 861–868. Google ScholarDigital Library
42. Alexandru Telea. 2004. An Image Inpainting Technique Based on the Fast Marching Method. Journal of Graphics Tools 9, 1 (2004), 23–34. Google ScholarCross Ref
43. Yonatan Wexler, Eli Shechtman, and Michal Irani. 2007. Space-Time Completion of Video. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 3 (2007), 463–476. Google ScholarDigital Library
44. Oliver Whyte, Josef Sivic, and Andrew Zisserman. 2009. Get Out of my Picture! Internet-based Inpainting. In British Machine Vision Conference. Google ScholarCross Ref
45. Junyuan Xie, Linli Xu, and Enhong Chen. 2012. Image Denoising and Inpainting with Deep Neural Networks. In Conference on Neural Information Processing Systems. 341–349. Google ScholarDigital Library
46. Chao Yang, Xin Lu, Zhe Lin, Eli Shechtman, Oliver Wang, and Hao Li. 2017. High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
47. Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In International Conference on Learning Representations.Google Scholar
48. Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. CoRR abs/1212.5701 (2012).Google Scholar
49. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Antonio Torralba, and Aude Oliva. 2016. Places: An Image Database for Deep Scene Understanding. CoRR abs/1610.02055 (2016).Google Scholar

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2017: Technical Papers

“Globally and locally consistent image completion” by Bi, Kalantari and Ramamoorthi

Conference:

Type(s):

Title:

Session/Category Title: Image Texture & Completion

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: