“UVDoc: Neural Grid-based Document Unwarping” by Verhoeven, Magne and Sorkine-Hornung
Conference:
Type(s):
Title:
- UVDoc: Neural Grid-based Document Unwarping
Session/Category Title:
- Computer Vision
Presenter(s)/Author(s):
Abstract:
Restoring the original, flat appearance of a printed document from casual photographs of bent and wrinkled pages is a common everyday problem. In this paper we propose a novel method for grid-based single-image document unwarping. Our method performs geometric distortion correction via a fully convolutional deep neural network that learns to predict the 3D grid mesh of the document and the corresponding 2D unwarping grid in a multi-task fashion, implicitly encoding the coupling between the shape of a 3D piece of paper and its 2D image. In order to allow unwarping models to train on data that is more realistic in appearance than the commonly used synthetic Doc3D dataset we create and publish our own dataset, called UVDoc, which combines pseudo-photorealistic document images with physically accurate 3D shape and unwarping function annotations. Our dataset is labeled with all the information necessary to train our unwarping network, without having to engineer separate loss functions that can deal with the lack of ground-truth typically found in document in the wild datasets. We perform an in-depth evaluation that demonstrates that with the inclusion of our novel pseudo-photorealistic dataset, our relatively small network architecture achieves state-of-the-art results on the DocUNet benchmark. We show that the pseudo-photorealistic nature of our UVDoc dataset allows for new and better evaluation methods, such as lighting-corrected MS-SSIM. We provide a novel benchmark dataset that facilitates such evaluations, and propose a metric that quantifies line straightness after unwarping. Our code, results and UVDoc dataset will be made publicly available upon publication.
References:
[1]
Stability AI. 2023. DeepFloyd IF. https://github.com/deep-floyd/IF.
[2]
M.S. Brown and W.B. Seales. 2001. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Vol. 2. 367–374 vol.2. https://doi.org/10.1109/ICCV.2001.937649
[3]
M.S. Brown and W.B. Seales. 2004. Image restoration of arbitrarily warped documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 10 (2004), 1295–1306. https://doi.org/10.1109/TPAMI.2004.87
[4]
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi. 2014. Describing Textures in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, USA, 3606–3613. https://doi.org/10.1109/CVPR.2014.461
[5]
Sagnik Das, Ke Ma, Zhixin Shu, and Dimitris Samaras. 2022. Learning an Isometric Surface Parameterization for Texture Unwrapping. In Proceedings of the European Conference on Computer Vision (ECCV). 580–597.
[6]
Sagnik Das, Ke Ma, Zhixin Shu, Dimitris Samaras, and Roy Shilkrot. 2019. DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 131–140. https://doi.org/10.1109/ICCV.2019.00022
[7]
Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, and Dimitris Samaras. 2021. End-to-end Piece-wise Unwarping of Document Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4268–4277. https://doi.org/10.1109/ICCV48922.2021.00423
[8]
Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, and Houqiang Li. 2023. Deep Unrestricted Document Image Rectification. https://doi.org/10.48550/arXiv.2304.08796
[9]
Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, and Houqiang Li. 2021a. DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction. In Proceedings of the ACM International Conference on Multimedia. 273–281. https://doi.org/10.1145/3474085.3475388
[10]
Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, and Houqiang Li. 2021b. DocScanner: Robust Document Image Rectification with Progressive Learning. https://doi.org/10.48550/arXiv.2110.14968
[11]
Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, and Houqiang Li. 2022. Geometric Representation Learning for Document Image Rectification. In Proceedings of the European Conference on Computer Vision (ECCV). 475–492. https://doi.org/10.1007/978-3-031-19836-6_27
[12]
Felix Hertlein, Alexander Naumann, and Patrick Philipp. 2023. Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping. International Journal on Document Analysis and Recognition (IJDAR) (29 Apr 2023), 12 pages. https://doi.org/10.1007/s10032-023-00434-x
[13]
Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, and Guisong Xia. 2022. Revisiting Document Image Dewarping by Grid Regularization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4533–4542. https://doi.org/10.1109/CVPR52688.2022.00450
[14]
Taeho Kil, Wonkyo Seo, Hyung Il Koo, and Nam Ik Cho. 2017. Robust Document Image Dewarping Method Using Text-Lines and Line Segments. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 865–870. https://doi.org/10.1109/ICDAR.2017.146
[15]
Beom Su Kim, Hyung Il Koo, and Nam Ik Cho. 2015. Document dewarping via text-line based optimization. Pattern Recognition 48, 11 (2015), 3600–3614. https://doi.org/10.1016/j.patcog.2015.04.026
[16]
Hyung Il Koo, Jinho Kim, and Nam Ik Cho. 2009. Composition of a Dewarped and Enhanced Document Image From Two View Images. IEEE Transactions on Image Processing 18, 7 (2009), 1551–1562. https://doi.org/10.1109/TIP.2009.2019301
[17]
Xiaoyu Li, Bo Zhang, Jing Liao, and Pedro V. Sander. 2019. Document rectification and illumination correction using a patch-based CNN. ACM Transactions on Graphics (TOG) 38 (2019), 1 – 11.
[18]
Jian Liang, D. DeMenthon, and D. Doermann. 2005. Flattening curved documents in images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. 338–345 vol. 2. https://doi.org/10.1109/CVPR.2005.163
[19]
Jian Liang, Daniel DeMenthon, and David S. Doermann. 2008. Geometric Rectification of Camera-Captured Document Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2008), 591–605. https://doi.org/10.1109/TPAMI.2007.70724
[20]
Xiyan Liu, Gaofeng Meng, Bin Fan, Shiming Xiang, and Chunhong Pan. 2020. Geometric rectification of document images using adversarial gated unwarping network. Pattern Recognition 108 (2020), 107576. https://doi.org/10.1016/j.patcog.2020.107576
[21]
Dong Luo and Pengbo Bo. 2022. Geometric Rectification of Creased Document Images based on Isometric Mapping. https://doi.org/10.48550/arXiv.2212.08365
[22]
Ke Ma, Sagnik Das, Zhixin Shu, and Dimitris Samaras. 2022. Learning From Documents in the Wild to Improve Document Unwarping. In Proceedings of ACM SIGGRAPH. 9 pages. https://doi.org/10.1145/3528233.3530756
[23]
Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. DocUNet: Document Image Unwarping via a Stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4700–4709. https://doi.org/10.1109/CVPR.2018.00494
[24]
Gaofeng Meng, Yuanqi Su, Ying Wu, Shiming Xiang, and Chunhong Pan. 2018. Exploiting Vector Fields for Geometric Rectification of Distorted Document Images. In Proceedings of the European Conference on Computer Vision (ECCV). 180–195. https://doi.org/10.1007/978-3-030-01270-0_11
[25]
Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. 2014. Active Flattening of Curved Document Images via Two Structured Beams. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3890–3897. https://doi.org/10.1109/CVPR.2014.497
[26]
C. Nachappa, N Shobha Rani, Peeta Basa Pati, and M. Gokulnath. 2023. Adaptive dewarping of severely warped camera-captured document images based on document map generation. International Journal on Document Analysis and Recognition (IJDAR) 26 (2023), 149–169. https://doi.org/10.1007/s10032-022-00425-4
[27]
tesseract-ocr. 2023. Tesseract user manual. https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html
[28]
Yuandong Tian and Srinivasa G Narasimhan. 2011. Rectification and 3D reconstruction of curved document images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 377–384. https://doi.org/10.1109/CVPR.2011.5995540
[29]
Yau-Chat Tsoi and Michael S. Brown. 2007. Multi-View Document Rectification using Boundary. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–8. https://doi.org/10.1109/CVPR.2007.383251
[30]
Adrian Ulges, Christoph H. Lampert, and Thomas Breuel. 2004. Document Capture Using Stereo Vision. In Proceedings of the 2004 ACM Symposium on Document Engineering. 198–200. https://doi.org/10.1145/1030397.1030434
[31]
Guo-Wang Xie, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2020. Dewarping Document Image by Displacement Flow Estimation with Fully Convolutional Network. In International Workshop on Document Analysis System. 131–144.
[32]
Guo-Wang Xie, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021. Document Dewarping with Control Points. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 466–480. https://doi.org/10.1007/978-3-030-86549-8_30
[33]
Zhen Xu, Fei Yin, Peipei Yang, and Cheng-Lin Liu. 2022. Document Image Rectification in Complex Scene Using Stacked Siamese Networks. In Proceedings of the 26th International Conference on Pattern Recognition (ICPR). 1550–1556. https://doi.org/10.1109/ICPR56361.2022.9956331
[34]
C. Xue, Z. Tian, F. Zhan, S. Lu, and S. Bai. 2022. Fourier Document Restoration for Robust Document Dewarping and Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4563–4572. https://doi.org/10.1109/CVPR52688.2022.00453
[35]
A. Yamashita, A. Kawarago, T. Kaneko, and K.T. Miura. 2004. Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), Vol. 1. 482–485. https://doi.org/10.1109/ICPR.2004.1334171
[36]
Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi. 2018. Multiview Rectification of Folded Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 2 (2018), 505–511. https://doi.org/10.1109/TPAMI.2017.2675980
[37]
Jiaxin Zhang, Canjie Luo, Lianwen Jin, Fengjun Guo, and Kai Ding. 2022. Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild. In Proceedings of the ACM International Conference on Multimedia. 2805–2815. https://doi.org/10.1145/3503161.3548214
[38]
L. Zhang and C.L. Tan. 2005. Warped image restoration with applications to digital libraries. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR). 192–196. https://doi.org/10.1109/ICDAR.2005.252
[39]
Li Zhang, Yu Zhang, and Chew Tan. 2008. An Improved Physically-Based Method for Geometric Restoration of Distorted Document Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 728–734. https://doi.org/10.1109/TPAMI.2007.70831
[40]
Zheng Zhang, Chew Lim Tan, and Liying Fan. 2004. Restoration of curved document images through 3D shape modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. I–I. https://doi.org/10.1109/CVPR.2004.1315007


