“FovVideoVDP: a visible difference predictor for wide field-of-view video” by Mantiuk, Denes, Chapiro, Kaplayan, Rufo, et al. …

  • ©Rafal K. Mantiuk, Gyorgy Denes, Alexandre (Alex) Chapiro, Anton Kaplayan, Gizem Rufo, Romain Bachy, Trisha Lian, and Anjul Patney




    FovVideoVDP: a visible difference predictor for wide field-of-view video



    FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm’s generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.


    1. Tunç Ozan Aydin, Martin Čadík, Karol Myszkowski, and Hans-Peter Seidel. 2010. Video quality assessment for computer graphics applications. ACM Transactions on Graphics 29, 6 (dec 2010), 1. Google ScholarDigital Library
    2. Reynold Bailey, Ann McNamara, Nisha Sudarsanam, and Cindy Grimm. 2009. Subtle Gaze Direction. ACM Transactions on Graphics 28, 4 (Sept. 2009). Google ScholarDigital Library
    3. Peter G. J. Barten. 1999. Contrast sensitivity of the human eye and its effects on image quality. SPIE Press. 208 pages.Google Scholar
    4. Peter G. J. Barten. 2004. Formula for the contrast sensitivity of the human eye. In Proc. SPIE 5294, Image Quality and System Performance, Yoichi Miyake and D. Rene Rasmussen (Eds.). 231–238. Google ScholarCross Ref
    5. Roy S. Berns. 1996. Methods for characterizing CRT displays. Displays 16, 4 (may 1996), 173–182. Google ScholarCross Ref
    6. Christina A. Burbeck and D. H. Kelly. 1980. Spatiotemporal characteristics of visual mechanisms: excitatory-inhibitory model. Journal of the Optical Society of America 70, 9 (sep 1980), 1121. Google ScholarCross Ref
    7. P. Burt and E. Adelson. 1983. The Laplacian Pyramid as a Compact Image Code. IEEE Transactions on Communications 31, 4 (apr 1983), 532–540. Google ScholarCross Ref
    8. Alexandre Chapiro, Robin Atkins, and Scott Daly. 2019. A Luminance-Aware Model of Judder Perception. ACM Transactions on Graphics (TOG) 38, 5 (2019).Google ScholarDigital Library
    9. S.J. Daly. 1993. Visible differences predictor: an algorithm for the assessment of image fidelity. In Digital Images and Human Vision, Andrew B. Watson (Ed.). Vol. 1666. MIT Press, 179–206. Google ScholarCross Ref
    10. Scott J Daly. 1998. Engineering observations from spatiovelocity and spatiotemporal visual models. In Human Vision and Electronic Imaging III, Vol. 3299. International Society for Optics and Photonics, 180–191.Google ScholarCross Ref
    11. R.L. De Valois, D.G. Albrecht, and L.G. Thorell. 1982. Spatial frequency selectivity of cells in macaque visual cortex. Vision Research 22, 5 (1982), 545–559.Google ScholarCross Ref
    12. Gyorgy Denes, Akshay Jindal, Aliaksei Mikhailiuk, and Rafał K. Mantiuk. 2020. A perceptual model of motion quality for rendering with adaptive refresh-rate and resolution. ACM Transactions on Graphics 39, 4 (jul 2020). Google ScholarDigital Library
    13. Robert F. Dougherty, Volker M. Koch, Alyssa A. Brewer, Bernd Fischer, Jan Modersitzki, and Brian A. Wandell. 2003. Visual field representations and locations of visual areas v1/2/3 in human visual cortex. Journal of Vision 3, 10 (2003), 586–598. Google ScholarCross Ref
    14. H De Lange Dzn. 1952. Experiments on flicker and some calculations on an electrical analogue of the foveal systems. Physica 18, 11 (1952), 935–950.Google ScholarCross Ref
    15. J. M. Foley. 1994. Human luminance pattern-vision mechanisms: masking experiments require a new model. Journal of the Optical Society of America A (1994).Google Scholar
    16. Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication. In Human Vision and Electronic Imaging III. SPIE. Google ScholarCross Ref
    17. M A Georgeson and G D Sullivan. 1975. Contrast constancy: deblurring in human vision by spatial frequency channels. J. Physiol. 252, 3 (nov 1975), 627–656.Google ScholarCross Ref
    18. Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D graphics. ACM Transactions on Graphics 31, 6 (Nov. 2012), 1. Google ScholarDigital Library
    19. S.T. Hammett and A.T. Smith. 1992. Two temporal channels or three? A re-evaluation. Vision Research 32, 2 (feb 1992), 285–291. Google ScholarCross Ref
    20. E Hartmann, B Lachenmayr, and H Brettel. 1979. The peripheral critical flicker frequency. Vision Research 19, 9 (1979), 1019–1023.Google ScholarCross Ref
    21. Jonathan C. Horton. 1991. The Representation of the Visual Field in Human Striate Cortex. Archives of Ophthalmology 109, 6 (June 1991), 816. Google ScholarCross Ref
    22. Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of validity of PSNR in image/video quality assessment. Electronics letters 44, 13 (2008), 800–801.Google Scholar
    23. Yize Jin, Meixu Chen, Todd Goodall Bell, Zhaolin Wan, and Alan Bovik. 2020. Study of 2D foveated video quality in virtual reality. In Applications of Digital Image Processing XLIII, Vol. 11510. International Society for Optics and Photonics, 1151007.Google Scholar
    24. Yize Jin, Meixu Chen, Todd Goodall, Anjul Patney, and Alan Bovik. 2019. LIVE-Facebook Technologies-Compressed Virtual Reality (LIVE-FBT-FCVR) Databases. http://live.ece.utexas.edu/research/LIVEFBTFCVR/index.html.Google Scholar
    25. Yize Jin, Meixu Chen, Todd Goodall, Anjul Patney, and Alan Bovik. 2021. Subjective and objective quality assessment of 2D and 3D foveated video compression in virtual reality. IEEE transactions on Image Processing in review (2021).Google ScholarDigital Library
    26. Anton S. Kaplanyan, Anton Sochenov, Thomas Leimkuehler, Mikhail Okunev, Todd Goodall, and Gizem Rufo. 2019. DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression using Learned Statistics of Natural Videos. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 38, 4 (2019), 212:1–212:13.Google Scholar
    27. D. H. Kelly. 1979. Motion and vision II Stabilized spatio-temporal threshold surface. Journal of the Optical Society of America 69, 10 (oct 1979), 1340. Google ScholarCross Ref
    28. D. H. Kelly. 1983. Spatiotemporal variation of chromatic and achromatic contrast thresholds. JOSA 73, 6 (1983), 742–750.Google ScholarCross Ref
    29. Frederick A.A. Kingdom and Paul Whittle. 1996. Contrast discrimination at high contrasts reveals the influence of local light adaptation on contrast processing. Vision Research 36, 6 (1996), 817–829. Google ScholarCross Ref
    30. Pavel Korshunov, P. Hanhart, T. Richter, A. Artusi, R.K. Mantiuk, and T. Ebrahimi. 2015. Subjective quality assessment database of HDR images compressed with JPEG XT. In QoMEX. 1–6. Google ScholarCross Ref
    31. Justin Laird, Mitchell Rosen, Jeff Pelz, Ethan Montag, and Scott Daly. 2006. Spatio-velocity CSF as a function of retinal velocity using unstabilized stimuli. In Human Vision and Electronic Imaging, Vol. 6057. 605705. Google ScholarCross Ref
    32. Gordon E. Legge and John M. Foley. 1980. Contrast masking in human vision. JOSA 70, 12 (dec 1980), 1458–71.Google ScholarCross Ref
    33. Rafał K. Mantiuk and Maryam Azimi. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Picture Coding Symposium.Google Scholar
    34. Rafał K. Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Trans. Graph. 30, 4, Article 40 (July 2011), 40:1–40:14 pages. Google ScholarDigital Library
    35. Rafał K. Mantiuk, Minjung Kim, Maliha Ashraf, Qiang Xu, M. Ronnier Luo, Jasna Martinovic, and Sophie Wuerger. 2020. Practical color contrast sensitivity functions for luminance levels up to 10 000 cd/m2. In Color Imaging Conference. 1–6. Google ScholarCross Ref
    36. A. Mikhailiuk, M. Pérez-Ortiz, D. Yue, W. Suen, and R. K. Mantiuk. 2021. Consolidated dataset and metrics for high-dynamic-range image quality. IEEE Transactions on Multimedia (2021), (in print).Google Scholar
    37. Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet, and Romuald Pepion. 2013. Tone mapping-based high-dynamic-range image compression: study of optimization criterion and perceptual quality. Optical Engineering 52, 10 (oct 2013), 102008. Google ScholarCross Ref
    38. Manish Narwaria, Matthieu Perreira Da Silva, and Patrick Le Callet. 2015. HDR-VQM: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35 (jul 2015), 46–60. Google ScholarDigital Library
    39. Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards foveated rendering for gaze-tracked virtual reality. ACM Transactions on Graphics (TOG) 35, 6 (2016), 179.Google ScholarDigital Library
    40. Eli Peli. 1990. Contrast in complex images. Journal of the Optical Society of America A 7, 10 (oct 1990), 2032–2040. Google ScholarCross Ref
    41. Eli Peli, Jian Yang, and Robert B. Goldstein. 1991. Image invariance with changes in size: the role of peripheral contrast thresholds. Journal of the Optical Society of America A 8, 11 (Nov. 1991), 1762. Google ScholarCross Ref
    42. Maria Perez-Ortiz and Rafal K. Mantiuk. 2017. A practical guide and software for analysing pairwise comparison experiments. arXiv preprint (dec 2017). arXiv:1712.03686 http://arxiv.org/abs/1712.03686Google Scholar
    43. Maria Perez-Ortiz, Aliaksei Mikhailiuk, Emin Zerman, Vedad Hulusic, Giuseppe Valenzise, and Rafal K. Mantiuk. 2020. From pairwise comparisons and rating to a unified quality scale. IEEE Transactions on Image Processing 29 (2020), 1139–1151. Google ScholarCross Ref
    44. Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, and C.-C. Jay Kuo. 2015. Image database TID2013: Peculiarities, results and perspectives. Signal Processing: Image Comm. 30 (2015), 57–77. Google ScholarDigital Library
    45. Snježana Rimac-Drlje, Goran Martinović, and Branka Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. In 2011 18th International Conference on Systems, Signals and Image Processing. IEEE, 1–4.Google Scholar
    46. Snježana Rimac-Drlje, Mario Vranješ, and Drago Žagar. 2010. Foveated mean squared error—a novel video quality metric. Multimedia tools and applications 49, 3 (2010), 425–445.Google Scholar
    47. J.G. Robson and Norma Graham. 1981. Probability summation and regional variation in contrast sensitivity across the visual field. Vision Research 21, 3 (jan 1981), 409–418. Google ScholarCross Ref
    48. Guodong Rong and Tiow-Seng Tan. 2006. Jump flooding in GPU with applications to Voronoi diagram and distance transform. In Proceedings of the 2006 symposium on Interactive 3D graphics and games. ACM, 109–116.Google ScholarDigital Library
    49. J. Rovamo and V. Virsu. 1979. An estimation and application of the human cortical magnification factor. Experimental Brain Research 37, 3 (1979), 495–510. Google ScholarCross Ref
    50. Kalpana Seshadrinathan and Alan Conrad Bovik. 2009. Motion tuned spatio-temporal quality assessment of natural videos. IEEE transactions on image processing 19, 2 (2009), 335–350.Google Scholar
    51. H.R. Sheikh, M.F. Sabir, and A.C. Bovik. 2006. A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms. IEEE Transactions on Image Processing 15, 11 (2006), 3440–3451. Google ScholarDigital Library
    52. E.P. Simoncelli and W.T. Freeman. 2002. The steerable pyramid: a flexible architecture for multi-scale derivative computation. In IEEE ICIP, Vol. 3. 444–447. Google ScholarCross Ref
    53. Vincent Sitzmann, Ana Serrano, Amy Pavel, Maneesh Agrawala, Diego Gutierrez, Belen Masia, and Gordon Wetzstein. 2017. How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics (2017).Google Scholar
    54. Philip L. Smith. 1998. Bloch’s law predictions from diffusion process models of detection. Australian Journal of Psychology 50, 3 (dec 1998), 139–147. Google ScholarCross Ref
    55. Rajiv Soundararajan and Alan C Bovik. 2012. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2012), 684–694.Google ScholarDigital Library
    56. Srinivas Sridharan, Reynold Bailey, Ann McNamara, and Cindy Grimm. 2012. Subtle gaze manipulation for improved mammography training. In Proceedings of the Symposium on Eye Tracking Research and Applications. 75–82.Google ScholarDigital Library
    57. C. F. Stromeyer and B. Julesz. 1972. Spatial-Frequency Masking in Vision: Critical Bands and Spread of Masking. Journal of the Optical Society of America 62, 10 (oct 1972), 1221. Google ScholarCross Ref
    58. Qi Sun, A. Patney, L.-Y. Wei, O. Shapira, J. Lu, P. Asente, S. Zhu, M. McGuire, D. Luebke, and A. Kaufman. 2018. Towards virtual reality infinite walking: Dynamic saccadic redirection. ACM Trans. on Graph. (2018), 16.Google Scholar
    59. Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. In Proceedings of the ACM Symposium on Applied Perception – SAP ’16. ACM Press. Google ScholarDigital Library
    60. Okan Tarhan Tursun, Elena Arabadzhiyska-Koleva, Marek Wernikowski, Radosław Mantiuk, Hans-Peter Seidel, Karol Myszkowski, and Piotr Didyk. 2019. Luminance-contrast-aware foveated rendering. ACM Transactions on Graphics 38, 4 (July 2019), 1–14. Google ScholarDigital Library
    61. Peter Vangorp, Karol Myszkowski, Erich W. Graf, and Rafał K. Mantiuk. 2015. A model of local adaptation. ACM Transactions on Graphics 34, 6 (oct 2015), 1–13. Google ScholarDigital Library
    62. V. Virsu and J. Rovamo. 1979. Visual resolution, contrast sensitivity, and the cortical magnification factor. Experimental Brain Research 37, 3 (Nov. 1979). Google ScholarCross Ref
    63. Zhou Wang, Alan C. Bovik, Ligang Lu, and Jack L. Kouloheris. 2001. Foveated wavelet image quality index. In Applications of Digital Image Processing XXIV, Andrew G. Tescher (Ed.). SPIE. Google ScholarCross Ref
    64. Z Wang, E.P. Simoncelli, and A.C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh A silomar Conference on Signals, Systems & Computers, 2003. IEEE, 1398–1402. Google ScholarCross Ref
    65. AB Watson and JA Solomon. 1997. Model of visual contrast gain control and pattern masking. Journal of the Optical Society of America A 14, 9 (1997), 2379–2391.Google ScholarCross Ref
    66. Andrew B. Watson. 1987. The cortex transform: Rapid computation of simulated neural images. Computer Vision, Graphics, and Image Processing 39, 3 (sep 1987), 311–327. Google ScholarDigital Library
    67. Andrew B. Watson and Albert J. Ahumada. 2016. The pyramid of visibility. Human Vision and Electronic Imaging 2016, HVEI 2016 (2016), 37–42. Google ScholarCross Ref
    68. Stefan Winkler, Murat Kunt, and Christian J van den Branden Lambrecht. 2001. Vision and video: models and applications. In Vision Models and Applications to Image and Video Processing. Springer, 201–229.Google Scholar
    69. Krzysztof Wolski, Daniele Giunchi, Nanyang Ye, Piotr Didyk, Karol Myszkowski, Radosław Mantiuk, Hans-Peter Seidel, Anthony Steed, and Rafał K. Mantiuk. 2018. Dataset and Metrics for Predicting Local Visible Differences. ACM Transactions on Graphics 37, 5 (nov 2018), 1–14. Google ScholarDigital Library
    70. Lei Yang, Diego Nehab, Pedro V. Sander, Pitchaya Sitthi-amorn, Jason Lawrence, and Hugues Hoppe. 2009. Amortized Supersampling. ACM Trans. Graph. 28, 5, Article 135 (Dec. 2009), 12 pages. Google ScholarDigital Library
    71. Nanyang Ye, Krzysztof Wolski, and Rafal K. Mantiuk. 2019. Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance. In CVPR. 5429–5437. Google ScholarCross Ref
    72. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR. 586–595. Google ScholarCross Ref

ACM Digital Library Publication: