“QT-Font: High-efficiency Font Synthesis via Quadtree-based Diffusion Models”
Conference:
Type(s):
Title:
- QT-Font: High-efficiency Font Synthesis via Quadtree-based Diffusion Models
Presenter(s)/Author(s):
Abstract:
We propose a novel sparse glyph representation via quadtree and an efficient font synthesis method via dual quadtree and discrete diffusion model, QT-Font. QT-Font, compared to existing approaches, can generate high-resolution glyph images with superior quality and more visually pleasing details, meanwhile significantly reducing both parameter sizes and computational costs.
References:
[1]
Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas Guibas. 2018. Learning representations and generative models for 3d point clouds. In International conference on machine learning. PMLR, 40?49.
[2]
Haruka Aoki and Kiyoharu Aizawa. 2022. SVG Vector Font Generation for Chinese Characters with Transformer. arXiv preprint arXiv:2206.10329 (2022).
[3]
Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. 2021. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems 34 (2021), 17981?17993.
[4]
Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. 2018. Multi-content gan for few-shot font style transfer. In CVPR. 7564?7573.
[5]
Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. 2020. Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Information Processing Systems 33 (2020), 16351?16361.
[6]
Chia-Hao Chen, Ying-Tian Liu, Zhifei Zhang, Yuan-Chen Guo, and Song-Hai Zhang. 2023. Joint Implicit Neural Representation for High-fidelity and Compact Vector Fonts. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 5538?5548.
[7]
Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang. 2020. Bsp-net: Generating compact meshes via binary space partitioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 45?54.
[8]
Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5939?5948.
[9]
Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, and Andrea Tagliasacchi. 2020. Cvxnet: Learnable convex decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 31?44.
[10]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780?8794.
[11]
Yue Gao, Yuan Guo, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. Artistic glyph image synthesis via one-stage few-shot learning. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1?12.
[12]
Chris Green. 2007. Improved alpha-tested magnification for vector textures and special effects. In ACM SIGGRAPH 2007 courses. 9?18.
[13]
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. 2022. Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10696?10706.
[14]
Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao, and Yu Qiao. 2022. Diff-Font: Diffusion Model for Robust One-Shot Font Generation. arXiv preprint arXiv:2212.05895 (2022).
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840?6851.
[16]
Yaoxiong Huang, Mengchao He, Lianwen Jin, and Yongpan Wang. 2020. RD-GAN: few/zero-shot Chinese character style transfer via radical decomposition and rendering. In European conference on computer vision. Springer, 156?172.
[17]
Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2017. DCFont: an end-to-end deep Chinese font generation system. In SIGGRAPH Asia. ACM, 22.
[18]
Yuxin Kong, Canjie Luo, Weihong Ma, Qiyuan Zhu, Shenggao Zhu, Nicholas Yuan, and Lianwen Jin. 2022. Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13482?13491.
[19]
Zhouhui Lian and Yichen Gao. 2022. CVFont: Synthesizing Chinese Vector Fonts via Deep Layout Inferring. In Computer Graphics Forum. Wiley Online Library.
[20]
Qisheng Liao, Zhinuo Wang, Muhammad Abdul-Mageed, and Gus Xia. 2023. CalliPaint: Chinese Calligraphy Inpainting with Diffusion Model. arXiv preprint arXiv:2312.01536 (2023).
[21]
Yitian Liu and Zhouhui Lian. 2023. DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by Integrating Dual-modality Generative Models. arXiv preprint arXiv:2312.10314 (2023).
[22]
Ying-Tian Liu, Yuan-Chen Guo, Yi-Xiao Li, Chen Wang, and Song-Hai Zhang. 2022. Learning implicit glyph shape representation. IEEE Transactions on Visualization and Computer Graphics (2022).
[23]
Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. 2019. A learned representation for scalable vector graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7930?7939.
[24]
Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, and Wenyu Liu. 2017. Auto-encoder guided GAN for Chinese calligraphy synthesis. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 1095?1100.
[25]
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022).
[26]
Wei Pan, Anna Zhu, Xinyu Zhou, Brian Kenji Iwana, and Shilin Li. 2023. Few shot font generation via transferring similarity guided global style and quantization local style. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 19506?19516.
[27]
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165?174.
[28]
Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, and Hyunjung Shim. 2020. Few-shot Font Generation with Localized Style Representations and Factorization. arXiv preprint arXiv:2009.11042 (2020).
[29]
Song Park, Sanghyuk Chun, Junbum Cha, Bado Lee, and Hyunjung Shim. 2021. Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts. arXiv preprint arXiv:2104.00887 (2021).
[30]
Pradyumna Reddy, Zhifei Zhang, Zhaowen Wang, Matthew Fisher, Hailin Jin, and Niloy Mitra. 2021. A multi-implicit neural representation for fonts. Advances in Neural Information Processing Systems 34 (2021), 12637?12647.
[31]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684?10695.
[32]
Danyang Sun, Tongzheng Ren, Chongxun Li, Hang Su, and Jun Zhu. 2017. Learning to write stylized chinese characters by reading a handful of examples. arXiv preprint arXiv:1712.06424 (2017).
[33]
Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding, and Jingdong Wang. 2022. Few-Shot Font Generation by Learning Fine-Grained Local Styles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7895?7904.
[34]
Yuchen Tian. 2017. zi2zi: Master chinese calligraphy with conditional adversarial networks, 2017. Retrieved Jun 3 (2017), 2017.
[35]
Chi Wang, Min Zhou, Tiezheng Ge, Yuning Jiang, Hujun Bao, and Weiwei Xu. 2023. CF-Font: Content Fusion for Few-Shot Font Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1858?1867.
[36]
Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. 2017. O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions On Graphics (TOG) 36, 4 (2017), 1?11.
[37]
Peng-Shuai Wang, Yang Liu, and Xin Tong. 2022. Dual octree graph networks for learning adaptive volumetric shape representations. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1?15.
[38]
Yizhi Wang and Zhouhui Lian. 2021. DeepVecFont: synthesizing high-quality vector fonts via dual-modality learning. ACM Transactions on Graphics (TOG) 40, 6 (2021), 1?15.
[39]
Jane Wilhelms and Allen Van Gelder. 1992. Octrees for faster isosurface generation. ACM Transactions on Graphics (TOG) 11, 3 (1992), 201?227.
[40]
Shan-Jean Wu, Chih-Yuan Yang, and Jane Yung-jen Hsu. 2020. Calligan: Style and structure-aware chinese calligraphy character generator. arXiv preprint arXiv:2005.12500 (2020).
[41]
Zeqing Xia, Bojun Xiong, and Zhouhui Lian. 2023. VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1848?1857.
[42]
Yangchen Xie, Xinyuan Chen, Li Sun, and Yue Lu. 2021. DG-Font: Deformable Generative Networks for Unsupervised Font Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5130?5140.
[43]
Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. 2019. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision. 4541?4550.
[44]
Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, and Lianwen Jin. 2023. FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning. arXiv preprint arXiv:2312.12142 (2023).
[45]
Jinshan Zeng, Qi Chen, Yunxin Liu, Mingwen Wang, and Yuan Yao. 2021. Strokegan: Reducing mode collapse in chinese font generation via stroke encoding. In proceedings of AAAI, Vol. 3.
[46]
Yexun Zhang, Ya Zhang, and Wenbin Cai. 2018. Separating style and content for generalized style transfer. In CVPR. 8447?8455.