DeepFormableTag: end-to-end generation and recognition of deformable fiducial markers

Fiducial markers have been broadly used to identify objects or embed messages that can be detected by a camera. Primarily, existing detection methods assume that markers are printed on ideally planar surfaces. The size of a message or identification code is limited by the spatial resolution of binary patterns in a marker. Markers often fail to be recognized due to various imaging artifacts of optical/perspective distortion and motion blur. To overcome these limitations, we propose a novel deformable fiducial marker system that consists of three main parts: First, a fiducial marker generator creates a set of free-form color patterns to encode significantly large-scale information in unique visual codes. Second, a differentiable image simulator creates a training dataset of photorealistic scene images with the deformed markers, being rendered during optimization in a differentiable manner. The rendered images include realistic shading with specular reflection, optical distortion, defocus and motion blur, color alteration, imaging noise, and shape deformation of markers. Lastly, a trained marker detector seeks the regions of interest and recognizes multiple marker patterns simultaneously via inverse deformation transformation. The deformable marker creator and detector networks are jointly optimized via the differentiable photorealistic renderer in an end-to-end manner, allowing us to robustly recognize a wide range of deformable markers with high accuracy. Our deformable marker system is capable of decoding 36-bit messages successfully at ~29 fps with severe shape deformation. Results validate that our system significantly outperforms the traditional and data-driven marker methods. Our learning-based marker system opens up new interesting applications of fiducial markers, including cost-effective motion capture of the human body, active 3D scanning using our fiducial markers’ array as structured light patterns, and robust augmented reality rendering of virtual objects on dynamic surfaces.

References:

1. Shumeet Baluja. 2017. Hiding images in plain sight: Deep steganography. In The Conference and Workshop on Neural Information Processing Systems. 2069–2079.Google Scholar
2. Ross Bencina, Martin Kaltenbrunner, and Sergi Jorda. 2005. Improved topological fiducial tracking in the reactivision system. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 99–99.Google ScholarDigital Library
3. Filippo Bergamasco, Andrea Albarelli, Emanuele Rodola, and Andrea Torsello. 2011. Rune-tag: A high accuracy fiducial marker with strong occlusion resilience. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 113–120.Google ScholarDigital Library
4. Joseph DeGol, Timothy Bretl, and Derek Hoiem. 2017. ChromaTag: a colored marker and fast detection algorithm. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 1472–1481.Google ScholarCross Ref
5. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Ieee, 248–255.Google ScholarCross Ref
6. Denso Wave. 1994. Quick Response (QR) code. https://d1wqtxts1xzle7.cloudfront.net/51791265/Three_QR_Code.pdfGoogle Scholar
7. Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 224–236.Google ScholarCross Ref
8. Jean Duchon. 1977. Splines minimizing rotation-invariant semi-norms in Sobolev spaces. In Constructive Theory of Functions of Several Variables, Walter Schempp and Karl Zeller (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 85–100.Google Scholar
9. Mark Fiala. 2005. ARTag, a fiducial marker system using digital techniques. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2. IEEE, 590–596.Google ScholarDigital Library
10. John G Fryer and Duane C Brown. 1986. Lens distortion for close-range photogrammetry. Photogrammetric engineering and remote sensing 52, 1 (1986), 51–58.Google Scholar
11. Sergio Garrido-Jurado, Rafael Muñoz-Salinas, Francisco José Madrid-Cuevas, and Manuel Jesús Marín-Jiménez. 2014. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280–2292.Google ScholarDigital Library
12. Sergio Garrido-Jurado,Rafael Munoz-Salinas, Francisco José Madrid-Cuevas, and Rafael Medina-Carnicer. 2016. Generation of fiducial marker dictionaries using mixed integer linear programming. Pattern Recognition 51 (2016), 481–491.Google ScholarDigital Library
13. Oleg Grinchuk, Vadim Lebedev, and Victor Lempitsky. 2016. Learnable visual markers. In The Conference and Workshop on Neural Information Processing Systems. 4143–4151.Google Scholar
14. Jamie Hayes and George Danezis. 2017. Generating steganographic images via adversarial training. In The Conference and Workshop on Neural Information Processing Systems. 1954–1963.Google Scholar
15. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2961–2969.Google ScholarCross Ref
16. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.Google ScholarCross Ref
17. Danying Hu, Daniel DeTone, and Tomasz Malisiewicz. 2019. Deep charuco: Dark charuco marker pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8436–8444.Google ScholarCross Ref
18. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In The Conference and Workshop on Neural Information Processing Systems. 2017–2025. http://papers.nips.cc/paper/5854-spatial-transformer-networksGoogle Scholar
19. Neil F Johnson and Sushil Jajodia. 1998. Exploring steganography: Seeing the unseen. Computer 31, 2 (1998), 26–34.Google ScholarDigital Library
20. Jan Kallwies, Bianca Forkel, and Hans-Joachim Wuensche. 2020. Determining and Improving the Localization Accuracy of AprilTag Detection. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 8288–8294.Google Scholar
21. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations. https://openreview.net/forum?id=Hk99zCeAbGoogle Scholar
22. Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4401–4410.Google ScholarCross Ref
23. Hirokazu Kato and Mark Billinghurst. 1999. Marker tracking and hmd calibration for a video-based augmented reality conferencing system. In Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR’99). IEEE, 85–94.Google ScholarDigital Library
24. Maximilian Krogius, Acshi Haggenmiller, and Edwin Olson. 2019. Flexible Layouts for Fiducial Tags.. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1898–1903.Google ScholarDigital Library
25. Youngwan Lee, Joong-won Hwang, Sangrok Lee, Yuseok Bae, and Jongyoul Park. 2019. An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarCross Ref
26. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2117–2125.Google ScholarCross Ref
27. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European conference on computer vision (ECCV). Springer, 740–755.Google ScholarCross Ref
28. Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV). 85–100.Google ScholarDigital Library
29. Rafael Munoz-Salinas. 2012. Aruco: a minimal library for augmented reality applications based on opencv. Universidad de Córdoba (2012).Google Scholar
30. Leonid Naimark and Eric Foxlin. 2002. Circular data matrix fiducial system and robust image processing for a wearable vision-inertial self-tracker. In Proceedings. International Symposium on Mixed and Augmented Reality. IEEE, 27–36.Google ScholarCross Ref
31. Gaku Narita, Yoshihiro Watanabe, and Masatoshi Ishikawa. 2016. Dynamic projection mapping onto deforming non-rigid surface using deformable dot cluster marker. IEEE transactions on visualization and computer graphics 23, 3 (2016), 1235–1248.Google Scholar
32. Edwin Olson. 2011. AprilTag: A robust and flexible visual fiducial system. In 2011 IEEE International Conference on Robotics and Automation. IEEE, 3400–3407.Google ScholarCross Ref
33. OpenCV. 2020. Open Source Computer Vision Library. https://opencv.org/. Version 4.2.0.Google Scholar
34. John Peace, Eric Psota, Yanfeng Liu, and Lance C. Pérez. 2020. E2ETag: An End-to-End Trainable Method for Generating and Detecting Fiducial Markers. In British Machine Vision Conference (BMVC).Google Scholar
35. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In The Conference and Workshop on Neural Information Processing Systems. 91–99.Google Scholar
36. Francisco J Romero-Ramirez, Rafael Muñoz-Salinas, and Rafael Medina-Carnicer. 2018. Speeded up detection of squared fiducial markers. Image and vision Computing 76 (2018), 38–47.Google Scholar
37. Matthew Tancik, Ben Mildenhall, and Ren Ng. 2020. Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2117–2126.Google ScholarCross Ref
38. Weixuan Tang, Shunquan Tan, Bin Li, and Jiwu Huang. 2017. Automatic steganographic distortion learning using a generative adversarial network. IEEE Signal Processing Letters 24, 10 (2017), 1547–1551.Google ScholarCross Ref
39. Hideaki Uchiyama and Eric Marchand. 2011. Deformable random dot markers. In 2011 10th IEEE International Symposium on Mixed and Augmented Reality. IEEE, 237–238.Google ScholarDigital Library
40. Bruce Walter, Stephen R Marschner, Hongsong Li, and Kenneth E Torrance. 2007. Microfacet Models for Refraction through Rough Surfaces. Rendering techniques 2007 (2007), 18th.Google Scholar
41. John Wang and Edwin Olson. 2016. AprilTag 2: Efficient and robust fiducial detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 4193–4198.Google ScholarDigital Library
42. Eric Wengrowski and Kristin Dana. 2019. Light field messaging with deep photographic steganography. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1515–1524.Google ScholarCross Ref
43. Pin Wu, Yang Yang, and Xiaoqiang Li. 2018. Stegnet: Mega image steganography capacity with deep convolutional network. Future Internet 10, 6 (2018), 54.Google ScholarCross Ref
44. Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github.com/facebookresearch/detectron2.Google Scholar
45. Anqi Xu and Gregory Dudek. 2011. Fourier tag: A smoothly degradable fiducial marker system with configurable payload capacity. In Canadian Conference on Computer and Robot Vision. IEEE, 40–47.Google ScholarDigital Library
46. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5505–5514.Google ScholarCross Ref
47. Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. 2018. Hidden: Hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV). 657–672.Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2021: Technical Papers

“DeepFormableTag: end-to-end generation and recognition of deformable fiducial markers” by Yaldiz, Meuleman, Jang, Ha and Kim

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: