“Document rectification and illumination correction using a patch-based CNN” by Li, Zhang, Liao and Sander – ACM SIGGRAPH HISTORY ARCHIVES

“Document rectification and illumination correction using a patch-based CNN” by Li, Zhang, Liao and Sander

  • 2019 SA Technical Papers_Li_Document rectification and illumination correction using a patch-based CNN

Conference:


Type(s):


Title:

    Document rectification and illumination correction using a patch-based CNN

Session/Category Title:   Photography in the Field


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    We propose a novel learning method to rectify document images with various distortion types from a single input image. As opposed to previous learning-based methods, our approach seeks to first learn the distortion flow on input image patches rather than the entire image. We then present a robust technique to stitch the patch results into the rectified document by processing in the gradient domain. Furthermore, we propose a second network to correct the uneven illumination, further improving the readability and OCR accuracy. Due to the less complex distortion present on the smaller image patches, our patch-based approach followed by stitching and illumination correction can significantly improve the overall accuracy in both the synthetic and real datasets.

References:


    1. Steve Bako, Soheil Darabi, Eli Shechtman, Jue Wang, Kalyan Sunkavalli, and Pradeep Sen. 2016. Removing Shadows from Images of Documents. In Asian Conference on Computer Vision. Springer, 173–183.Google Scholar
    2. Michael S Brown and W Brent Seales. 2001. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, Vol. 2. IEEE, 367–374.Google ScholarCross Ref
    3. Michael S Brown and Y-C Tsoi. 2006. Geometric and shading correction for images of printed materials using boundary. IEEE Transactions on Image Processing 15, 6 (2006), 1544–1554.Google ScholarDigital Library
    4. Huaigu Cao, Xiaoqing Ding, and Changsong Liu. 2003. Rectifying the bound document image captured by the camera: A model based approach. In Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on. IEEE, 71–75.Google Scholar
    5. Frédéric Courteille, Alain Crouzil, Jean-Denis Durou, and Pierre Gurdjos. 2007. Shape from shading for the digitization of curved documents. Machine Vision and Applications 18, 5 (2007), 301–316.Google ScholarDigital Library
    6. Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot. 2017. The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image. In Proceedings of the 2017 ACM Symposium on Document Engineering. ACM, 125–128.Google ScholarDigital Library
    7. MENG Gaofeng, SU Yuanqi, WU Ying, Shiming Xiang, PAN Chunhong, et al. 2018. Exploiting Vector Fields for Geometric Rectification of Distorted Document Images. (2018).Google Scholar
    8. Michaël Gharbi, YiChang Shih, Gaurav Chaurasia, Jonathan Ragan-Kelley, Sylvain Paris, and Frédo Durand. 2015. Transform recipes for efficient cloud photo enhancement. ACM Transactions on Graphics (TOG) 34, 6 (2015), 228.Google ScholarDigital Library
    9. Rafael C Gonzalez and Richard E Woods. 2007. Image processing. Digital image processing 2 (2007).Google Scholar
    10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
    11. Mingming He, Jing Liao, Pedro V Sander, and Hugues Hoppe. 2018. Gigapixel Panorama Video Loops. ACM Transactions on Graphics (TOG) 37, 1 (2018), 3.Google ScholarDigital Library
    12. Yuan He, Pan Pan, Shufu Xie, Jun Sun, and Satoshi Naoi. 2013. A book dewarping system by boundary-based 3D surface reconstruction. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on. IEEE, 403–407.Google ScholarDigital Library
    13. Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics (TOG) 35, 4 (2016), 110.Google ScholarDigital Library
    14. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. arXiv preprint (2017).Google Scholar
    15. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694–711.Google ScholarCross Ref
    16. Taeho Kil, Wonkyo Seo, Hyung Il Koo, and Nam Ik Cho. 2017. Robust Document Image Dewarping Method Using Text-Lines and Line Segments. In Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, Vol. 1. IEEE, 865–870.Google ScholarCross Ref
    17. Beom Su Kim, Hyung Il Koo, and Nam Ik Cho. 2015. Document dewarping via text-line based optimization. Pattern Recognition 48, 11 (2015), 3600–3614.Google ScholarDigital Library
    18. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
    19. Hyung Il Koo and Nam Ik Cho. 2010. State estimation in a document image and its application in text block identification and text line extraction. In European Conference on Computer Vision. Springer, 421–434.Google ScholarCross Ref
    20. Hyung Il Koo, Jinho Kim, and Nam Ik Cho. 2009. Composition of a dewarped and enhanced document image from two view images. IEEE Transactions on Image Processing 18, 7 (2009), 1551–1562.Google ScholarDigital Library
    21. Olivier Lavialle, X Molines, Franck Angella, and Pierre Baylou. 2001. Active contours network to straighten distorted text lines. In Image Processing, 2001. Proceedings. 2001 International Conference on, Vol. 3. IEEE, 748–751.Google ScholarCross Ref
    22. Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707–710.Google Scholar
    23. Xiaoyu Li, Bo Zhang, Pedro V Sander, and Jing Liao. 2019. Blind Geometric Distortion Correction on Images Through Deep Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4855–4864.Google ScholarCross Ref
    24. Jian Liang, Daniel DeMenthon, and David Doermann. 2005. Unwarping images of curved documents using global shape optimization. In Int. Workshop on Camerabased Document Analysis and Recognition. 25–29.Google Scholar
    25. Jian Liang, Daniel DeMenthon, and David Doermann. 2008. Geometric rectification of camera-captured document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 591–605.Google ScholarDigital Library
    26. Ke Ma, Zhixin Shu, Xue Bai, Jue Wang, and Dimitris Samaras. 2018. DocUNet: Document Image Unwarping via A Stacked U-Net. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4709.Google ScholarCross Ref
    27. Gaofeng Meng, Ying Wang, Shenquan Qu, Shiming Xiang, and Chunhong Pan. 2014. Active flattening of curved document images via two structured beams. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3890–3897.Google ScholarDigital Library
    28. Gaofeng Meng, Shiming Xiang, Nanning Zheng, Chunhong Pan, et al. 2013. Nonparametric illumination correction for scanned document images via convex hulls. age 9, 10 (2013), 11.Google Scholar
    29. Lothar Mischke and Wolfram Luther. 2005. Document image de-warping based on detection of distorted text lines. In International Conference on Image Analysis and Processing. Springer, 1068–1075.Google ScholarDigital Library
    30. Daniel Marques Oliveira and Rafael Dueire Lins. 2009. A new method for shading removal and binarization of documents acquired with portable digital cameras. In Proc. Third International Workshop Camera-Based Docu-ment Analysis and Recognition, Vol. 2. 3–10.Google Scholar
    31. Jaakko Sauvola and Matti Pietikäinen. 2000. Adaptive document image binarization. Pattern recognition 33, 2 (2000), 225–236.Google Scholar
    32. Vatsal Shah and Vineet Gandhi. 2018. An Iterative Approach for Shadow Removal in Document Images. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1892–1896.Google Scholar
    33. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
    34. Ray Smith. 2007. An overview of the Tesseract OCR engine. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 2. IEEE, 629–633.Google ScholarDigital Library
    35. Chew Lim Tan, Li Zhang, Zheng Zhang, and Tao Xia. 2006. Restoring warped document images through 3d shape modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 2 (2006), 195–208.Google ScholarDigital Library
    36. Yuandong Tian and Srinivasa G Narasimhan. 2011. Rectification and 3D reconstruction of curved document images. In CVPR 2011. IEEE, 377–384.Google ScholarDigital Library
    37. Bill Triggs, Philip F McLauchlan, Richard I Hartley, and Andrew W Fitzgibbon. 1999. Bundle adjustment a modern synthesis. In International workshop on vision algorithms. Springer, 298–372.Google Scholar
    38. Yau-Chat Tsoi and Michael S Brown. 2007. Multi-view document rectification using boundary. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 1–8.Google ScholarCross Ref
    39. Adrian Ulges, Christoph H Lampert, and Thomas Breuel. 2004. Document capture using stereo vision. In Proceedings of the 2004 ACM symposium on Document engineering. ACM, 198–200.Google ScholarDigital Library
    40. Toshikazu Wada, Hiroyuki Ukida, and Takashi Matsuyama. 1997. Shape from shading with interreflections under a proximal light source: Distortion-free copying of an unfolded book. International Journal of Computer Vision 24, 2 (1997), 125–135.Google ScholarDigital Library
    41. Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon PhiâĎć. Springer, 167–188.Google Scholar
    42. Changhua Wu and Gady Agam. 2002. Document image de-warping for text/graphics recognition. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 348–357.Google ScholarCross Ref
    43. Atsushi Yamashita, Atsushi Kawarago, Toru Kaneko, and Kenjiro T Miura. 2004. Shape reconstruction and image restoration for non-flat surfaces of documents with a stereo vision system. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 1. IEEE, 482–485.Google ScholarCross Ref
    44. Lei Yang, Yu-Chiu Tse, Pedro V. Sander, Jason Lawrence, Diego Nehab, Hugues Hoppe, and Clara L. Wilkins. 2011. Image-based Bidirectional Scene Reprojection. ACM Trans. Graph. 30, 6 (2011), 150:1–150:10.Google ScholarDigital Library
    45. Shaodi You, Yasuyuki Matsushita, Sudipta Sinha, Yusuke Bou, and Katsushi Ikeuchi. 2018. Multiview rectification of folded documents. IEEE transactions on pattern analysis and machine intelligence 40, 2 (2018), 505–511.Google ScholarDigital Library
    46. Ali Zandifar. 2007. Unwarping scanned image of japanese/english documents. In Image Analysis and Processing, 2007. ICIAP 2007. 14th International Conference on. IEEE, 129–136.Google ScholarCross Ref
    47. Li Zhang, Andy M Yip, Michael S Brown, and Chew Lim Tan. 2009. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognition 42, 11 (2009), 2961–2978.Google ScholarDigital Library
    48. Li Zhang, Yu Zhang, and Chew Tan. 2008. An improved physically-based method for geometric restoration of distorted document images. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 4 (2008), 728–734.Google ScholarDigital Library
    49. Zheng Zhang, Chew Lim Tan, and Liying Fan. 2004. Restoration of curved document images through 3D shape modeling. In null. IEEE, 10–15.Google Scholar


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org