“Seamless manga inpainting with semantics awareness” by Xie, Xia, Li and Wong

  • ©Minshan Xie, Menghan Xia, Chengze Li, and Tien-Tsin Wong




    Seamless manga inpainting with semantics awareness



    Manga inpainting fills up the disoccluded pixels due to the removal of dialogue balloons or “sound effect” text. This process is long needed by the industry for the language localization and the conversion to animated manga. It is mostly done manually, as existing methods (mostly for natural image inpainting) cannot produce satisfying results. Manga inpainting is more tricky than natural image inpainting because its highly abstract illustration using structural lines and screentone patterns, which confuses the semantic interpretation and visual content synthesis. In this paper, we present the first manga inpainting method, a deep learning model, that generates high-quality results. Instead of direct inpainting, we propose to separate the complicated inpainting into two major phases, semantic inpainting and appearance synthesis. This separation eases both the feature understanding and hence the training of the learning model. A key idea is to disentangle the structural line and screentone, that helps the network to better distinguish the structural line and the screentone features for semantic interpretation. Both the visual comparison and the quantitative experiments evidence the effectiveness of our method and justify its superiority over existing state-of-the-art methods in the application of manga inpainting.


    1. 2021. Photoshop. https://www.photoshop.com.Google Scholar
    2. Yuji Aramaki, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2016. Text detection in manga by combining connected-component-based and region-based classifications. In 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2901–2905.Google ScholarCross Ref
    3. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics (ToG), Vol. 28. ACM, 24.Google ScholarDigital Library
    4. Antonio Criminisi, Patrick Pérez, and Kentaro Toyama. 2004. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on image processing 13, 9 (2004), 1200–1212.Google ScholarDigital Library
    5. Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, and Pradeep Sen. 2012. Image melding: Combining inconsistent images using patch-based synthesis. ACM Transactions on graphics (TOG) 31, 4 (2012), 1–10.Google ScholarDigital Library
    6. Sarah F Frisken, Ronald N Perry, Alyn P Rockwood, and Thouis R Jones. 2000. Adaptively sampled distance fields: A general representation of shape for computer graphics. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 249–254.Google ScholarDigital Library
    7. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014), 2672–2680.Google Scholar
    8. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.Google ScholarDigital Library
    9. Jia-Bin Huang, Sing Bing Kang, Narendra Ahuja, and Johannes Kopf. 2014. Image completion using planar structure guidance. ACM Transactions on graphics (TOG) 33, 4 (2014), 1–10.Google Scholar
    10. Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1–14.Google ScholarDigital Library
    11. Kota Ito, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2015. Separation of Manga Line Drawings and Screentones.Google Scholar
    12. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
    13. Chengze Li, Xueting Liu, and Tien-Tsin Wong. 2017. Deep Extraction of Manga Structural Lines. ACM Transactions on Graphics (SIGGRAPH 2017 issue) 36, 4 (July 2017), 117:1–117:12.Google Scholar
    14. Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. In The European Conference on Computer Vision (ECCV).Google Scholar
    15. Hongyu Liu, Bin Jiang, Yi Xiao, and Chao Yang. 2019. Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4170–4179.Google ScholarCross Ref
    16. Xueting Liu, Chengze Li, and Tien-Tsin Wong. 2017. Boundary-aware texture region segmentation from manga. Computational Visual Media 3, 1 (2017), 61–71.Google ScholarCross Ref
    17. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
    18. Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. 2019. Edgeconnect: Generative image inpainting with adversarial edge learning. In The IEEE International Conference on Computer Vision (ICCV) Workshops.Google Scholar
    19. Seoung Wug Oh, Sungho Lee, Joon-Young Lee, and Seon Joo Kim. 2019. Onion-peel networks for deep video completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4403–4412.Google ScholarCross Ref
    20. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
    21. Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2536–2544.Google ScholarCross Ref
    22. Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H Li, Shan Liu, and Ge Li. 2019. Structureflow: Image inpainting via structure-aware appearance flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 181–190.Google ScholarCross Ref
    23. Kazuma Sasaki, Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Joint gap detection and inpainting of line drawings. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5725–5733.Google ScholarCross Ref
    24. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
    25. Yuhang Song, Chao Yang, Zhe Lin, Xiaofeng Liu, Qin Huang, Hao Li, and C-C Jay Kuo. 2018a. Contextual-based image inpainting: Infer, match, and translate. In Proceedings of the European Conference on Computer Vision (ECCV). 3–19.Google ScholarDigital Library
    26. Yuhang Song, Chao Yang, Yeji Shen, Peng Wang, Qin Huang, and C-C Jay Kuo. 2018b. Spg-net: Segmentation prediction and guidance network for image inpainting. arXiv preprint arXiv:1805.03356 (2018).Google Scholar
    27. Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.Google ScholarDigital Library
    28. Minshan Xie, Chengze Li, Xueting Liu, and Tien-Tsin Wong. 2020. Manga filling style conversion with screentone variational autoencoder. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–15.Google ScholarDigital Library
    29. Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).Google Scholar
    30. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5505–5514.Google ScholarCross Ref
    31. Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. 2019. Learning pyramid-context encoder network for high-quality image inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1486–1494.Google ScholarCross Ref
    32. Yu Zeng, Zhe Lin, Jimei Yang, Jianming Zhang, Eli Shechtman, and Huchuan Lu. 2020. High-resolution image inpainting with iterative confidence feedback and guided upsampling. In European Conference on Computer Vision. Springer, 1–17.Google ScholarDigital Library
    33. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.Google ScholarCross Ref

ACM Digital Library Publication: