DMHomo: Learning Homography with Diffusion Models

DMHomo leverages diffusion-models to generate a realistic dataset for homography learning. We train the generative model using pseudo labels. Additionally, we introduce an iterative process that improves both the homography estimator and the diffusion models successively, producing a qualified dataset and a state-of-the-art network.

References:

[1]
Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. 2017. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5173?5182.

[2]
Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. 2022. Analytic-DPM: An analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503 (2022).

[3]
Daniel Barath, Jiri Matas, and Jana Noskova. 2019. MAGSAC: Marginalizing sample consensus. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10197?10205.

[4]
Daniel Barath, Jana Noskova, Maksym Ivashechkin, and Jiri Matas. 2020. MAGSAC++, a fast, reliable and accurate robust estimator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1304?1312.

[5]
Daniel J. Butler, Jonas Wulff, Garrett B. Stanley, and Michael J. Black. 2012. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision. 611?625.

[6]
Si-Yuan Cao, Jianxin Hu, Zehua Sheng, and Hui-Liang Shen. 2022. Iterative deep homography estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1879?1888.

[7]
Che-Han Chang, Chun-Nan Chou, and Edward Y. Chang. 2017. CLKN: Cascaded Lucas-Kanade networks for image alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2213?2221.

[8]
Padraig Cunningham and Sarah Jane Delany. 2021. K-nearest neighbour classifiers?A tutorial. ACM Computing Surveys 54, 6 (2021), 1?25.

[9]
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2016. Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016).

[10]
Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2018. SuperPoint: Self-supervised interest point detection and description. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 224?236.

[11]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780?8794.

[12]
Tianjiao Ding, Yunchen Yang, Zhihui Zhu, Daniel P. Robinson, Ren? Vidal, Laurent Kneip, and Manolis C. Tsakiris. 2020. Robust homography estimation via dual principal component pursuit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6080?6089.

[13]
Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. FlowNet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 2758?2766.

[14]
Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6 (1981), 381?395.

[15]
Jochen Gast and Stefan Roth. 2018. Lightweight probabilistic deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3369?3378.

[16]
Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3354?3361.

[17]
Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Mehdi S. M. Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, and Andrea Tagliasacchi. 2022. Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3749?3761.

[18]
Yunhui Han, Kunming Luo, Ao Luo, Jiangyu Liu, Haoqiang Fan, Guiming Luo, and Shuaicheng Liu. 2022. RealFlow: EM-based realistic optical flow dataset generation from videos. In Proceedings of the European Conference on Computer Vision. 288?305.

[19]
Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.

[20]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840?6851.

[21]
Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).

[22]
Mingbo Hong, Yuhang Lu, Nianjin Ye, Chunyu Lin, Qijun Zhao, and Shuaicheng Liu. 2022. Unsupervised homography estimation with coplanarity-aware GAN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 17663?17672.

[23]
Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. 2023. Simple diffusion: End-to-end diffusion for high resolution images. arXiv preprint arXiv:2301.11093 (2023).

[24]
Eddy Ilg, Ozgun Cicek, Silvio Galesso, Aaron Klein, Osama Makansi, Frank Hutter, and Thomas Brox. 2018. Uncertainty estimates and multi-hypotheses networks for optical flow. In Proceedings of the European Conference on Computer Vision. 652?667.

[25]
Hai Jiang, Haipeng Li, Yuhang Lu, Songchen Han, and Shuaicheng Liu. 2022. Semi-supervised deep large-baseline homography estimation with progressive equivalence constraint. arXiv preprint arXiv:2212.02763 (2022).

[26]
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364 (2022).

[27]
Dewi Endah Kharismawati, Hadi Ali Akbarpour, Rumana Aktar, Filiz Bunyak, Kannappan Palaniappan, and Toni Kazic. 2020. CorNet: Unsupervised deep homography estimation for agricultural aerial imagery. In Proceedings of the European Conference on Computer Vision. 400?417.

[28]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[29]
Durk P. Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. Advances in Neural Information Processing Systems 28 (2015), 1?9.

[30]
Hoang Le, Feng Liu, Shu Zhang, and Aseem Agarwala. 2020. Deep homography estimation for dynamic scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7652?7661.

[31]
Haipeng Li, Kunming Luo, and Shuaicheng Liu. 2021. GyroFlow: Gyroscope-guided unsupervised optical flow learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12869?12878.

[32]
Haipeng Li, Kunming Luo, Bing Zeng, and Shuaicheng Liu. 2023. GyroFlow+: Gyroscope-guided unsupervised deep homography and optical flow learning. arXiv preprint arXiv:2301.10018 (2023).

[33]
Zhengqi Li and Noah Snavely. 2018. MegaDepth: Learning single-view depth prediction from Internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2041?2050.

[34]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll?r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740?755.

[35]
Shuaicheng Liu, Haipeng Li, Zhengning Wang, Jue Wang, Shuyuan Zhu, and Bing Zeng. 2021a. DeepOIS: Gyroscope-guided deep optical image stabilizer compensation. IEEE Transactions on Circuits and Systems for Video Technology. Published Online, August 9, 2021. DOI: 10.1109/TCSVT.2021.3103281

[36]
Shuaicheng Liu, Yuhang Lu, Hai Jiang, Nianjin Ye, Chuan Wang, and Bing Zeng. 2022. Unsupervised global and local homography estimation with motion basis learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. Published Online, November 21, 2022.

[37]
Shuaicheng Liu, Nianjin Ye, Chuan Wang, Kunming Luo, Jue Wang, and Jian Sun. 2023b. Content-aware unsupervised deep homography estimation and its extensions. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 3 (2023), 2849?2863.

[38]
Shuaicheng Liu, Lu Yuan, Ping Tan, and Jian Sun. 2013. Bundled camera paths for video stabilization. ACM Transactions on Graphics 32, 4 (2013), 1?10.

[39]
Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. 2023a. More control for free! Image synthesis with semantic diffusion guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 289?299.

[40]
Zhen Liu, Wenjie Lin, Xinpeng Li, Qing Rao, Ting Jiang, Mingyan Han, Haoqiang Fan, Jian Sun, and Shuaicheng Liu. 2021b. ADNet: Attention-guided deformable convolutional network for high dynamic range imaging. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 463?470.

[41]
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91?110.

[42]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927 (2022).

[43]
Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sj?lund, and Thomas B. Sch?n. 2023. Image restoration with mean-reverting stochastic differential equations. arXiv preprint arXiv:2301.11699 (2023).

[44]
Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3061?3070.

[45]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65, 1 (2021), 99?106.

[46]
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147?1163.

[47]
Ty Nguyen, Steven W. Chen, Shreyas S. Shivakumar, Camillo Jose Taylor, and Vijay Kumar. 2018. Unsupervised deep homography: A fast and robust homography estimation model. IEEE Robotics and Automation Letters 3, 3 (2018), 2346?2353.

[48]
Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B. Goldman, Steven M. Seitz, and Ricardo Martin-Brualla. 2021. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5865?5874.

[49]
Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. D-NeRF: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10318? 10327.

[50]
Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang, Alejandro Newell, Hei Law, Ankit Goyal, Kaiyu Yang, and Jia Deng. 2023. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12630?12641.

[51]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj?rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684?10695.

[52]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. 2564?2571.

[53]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2022. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence. Published Online, September 12, 2022.

[54]
Tim Salimans and Jonathan Ho. 2022. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022).

[55]
Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. 2020. SuperGlue: Learning feature matching with graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4938?4947.

[56]
Olivier Saurer, Friedrich Fraundorfer, and Marc Pollefeys. 2012. Homography based visual odometry with known vertical direction and weak Manhattan world assumption. In Proceedings of the Vicomor Workshop at IROS, Vol. 2012.

[57]
Ruizhi Shao, Gaochang Wu, Yuemei Zhou, Ying Fu, Lu Fang, and Yebin Liu. 2021. LocalTrans: A multiscale local transformer network for cross-resolution homography estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14890?14899.

[58]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning. 2256?2265.

[59]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020a. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).

[60]
Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems 32 (2019), 1?13.

[61]
Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. 2021. Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005 (2021).

[62]
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020b. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).

[63]
Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xiaowei Zhou. 2021. LoFTR: Detector-free local feature matching with transformers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8922?8931.

[64]
Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit H. Bermano. 2022. Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022).

[65]
Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. 2019. SOSNet: Second order similarity regularization for local descriptor learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11016?11025.

[66]
Prune Truong, Martin Danelljan, and Radu Timofte. 2020. GLU-Net: Global-local universal network for dense flow and correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6258?6268.

[67]
Prune Truong, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2021. Learning accurate dense correspondences and when to trust them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5714? 5724.

[68]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1?11.

[69]
Yinhuai Wang, Jiwen Yu, and Jian Zhang. 2022. Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022).

[70]
Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, and Chi-Keung Tang. 2018. Deep high dynamic range imaging with large foreground motions. In Proceedings of the European Conference on Computer Vision. 117?132.

[71]
Nianjin Ye, Chuan Wang, Haoqiang Fan, and Shuaicheng Liu. 2021. Motion basis learning for unsupervised deep homography estimation with subspace projection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13117?13125.

[72]
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. LIFT: Learned Invariant Feature Transform. In Proceedings of the European Conference on Computer Vision. 467?483.

[73]
Jason J. Yu, Adam W. Harley, and Konstantinos G. Derpanis. 2016. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In Computer Vision?ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9915. Springer, 3?10.

[74]
Jirong Zhang, Chuan Wang, Shuaicheng Liu, Lanpeng Jia, Nianjin Ye, Jue Wang, Ji Zhou, and Jian Sun. 2020. Content-aware unsupervised deep homography estimation. In Proceedings of the European Conference on Computer Vision. 653?669.

ACM Digital Library Publication:

DMHomo: Learning Homography with Diffusion Models

Overview Page:

SIGGRAPH 2024: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“DMHomo: Learning Homography with Diffusion Models” by Li, Jiang, Luo, Tan, Fan, et al. …

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: