“DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos” by Kaplanyan, Sochenov, Leimkühler, Okunev, Goodall, et al. … – ACM SIGGRAPH HISTORY ARCHIVES

“DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos” by Kaplanyan, Sochenov, Leimkühler, Okunev, Goodall, et al. …

  • 2019 Talks: Kaplanyan_DeepFovea: Neural Reconstruction for Foveated Rendering and Video Compression using Learned Natural Video Statistics

Conference:


Type(s):


Title:

    DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos

Session/Category Title:   Thoughts on Display


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    In order to provide an immersive visual experience, modern displays require head mounting, high image resolution, low latency, as well as high refresh rate. This poses a challenging computational problem. On the other hand, the human visual system can consume only a tiny fraction of this video stream due to the drastic acuity loss in the peripheral vision. Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks. We reconstruct a plausible peripheral video from a small fraction of pixels provided every frame. The reconstruction is done by finding the closest matching video to this sparse input stream of pixels on the learned manifold of natural videos. Our method is more efficient than the state-of-the-art foveated rendering, while providing the visual experience with no noticeable quality degradation. We conducted a user study to validate our reconstruction method and compare it against existing foveated rendering and video compression techniques. Our method is fast enough to drive gaze-contingent head-mounted displays in real time on modern hardware. We plan to publish the trained network to establish a new quality bar for foveated rendering and compression as well as encourage follow-up research.

References:


    1. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016).Google Scholar
    2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 214–223.Google ScholarDigital Library
    3. Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR abs/1607.06450 (2016). arXiv:1607.06450 http://arxiv.org/abs/1607.06450Google Scholar
    4. Christos Bampis, Zhi Li, Ioannis Katsavounidis, Te-Yuan Huang, Chaitanya Ekanadham, and Alan C. Bovik. 2018. Towards Perceptually Optimized End-to-end Adaptive Video Streaming. arXiv preprint arXiv:1808.03898 (2018).Google Scholar
    5. Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-GAN: Unsupervised Video Retargeting. In Proc. European Conference on Computer Vision.Google ScholarCross Ref
    6. Chris Bradley, Jared Abrams, and Wilson S. Geisler. 2014. Retina-V1 model of detectability across the visual field. Journal of vision 14, 12 (2014), 22–22.Google ScholarCross Ref
    7. Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi, Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 4, Article 98 (2017), 98:1–98:12 pages.Google Scholar
    8. Lark Kwon Choi and Alan Conrad Bovik. 2018. Video quality assessment accounting for temporal visual masking of local flicker. Signal Processing: Image Communication 67 (2018), 182 — 198.Google ScholarCross Ref
    9. Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Conference on Learning Representations, ICLR abs/1511.07289 (2016).Google Scholar
    10. Robert L Cook. 1986. Stochastic sampling in computer graphics. ACM Transactions on Graphics (TOG) 5, 1 (1986), 51–72.Google ScholarDigital Library
    11. Christine A Curcio, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. 1990. Human photoreceptor topography. Journal of comparative neurology 292, 4 (1990), 497–523.Google ScholarCross Ref
    12. Dennis M Dacey and Michael R Petersen. 1992. Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. Proceedings of the National Academy of sciences 89, 20 (1992), 9666–9670.Google ScholarCross Ref
    13. Wilson S. Geisler. 2008. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59 (2008), 167–192.Google ScholarCross Ref
    14. Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication., 3299 – 3299 – 12 pages.Google Scholar
    15. Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proc. Conf. Computer Vision and Pattern Recognition (2014), 580–587.Google ScholarDigital Library
    16. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv e-prints (2014), arXiv:1406.2661. https://arxiv.org/abs/1406.2661Google Scholar
    17. Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D Graphics. ACM Trans. Graph. (Proc. SIGGRAPH) 31, 6, Article 164 (2012), 164:1–164:10 pages.Google ScholarDigital Library
    18. Lars Haglund. 2006. The SVT High Definition Multi Format Test Set. (2006). https://media.xiph.org/video/derf/vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/SVT_MultiFormat_v10.pdfGoogle Scholar
    19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (2016), 770–778.Google Scholar
    20. Yong He, Yan Gu, and Kayvon Fatahalian. 2014. Extending the Graphics Pipeline with Adaptive, Multi-rate Shading. ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4, Article 142 (2014), 142:1–142:12 pages.Google Scholar
    21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google Scholar
    22. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In Proc. Conf. Computer Vision and Pattern Recognition. http://lmb.informatik.uni-freiburg.de//Publications/2017/IMKDB17Google Scholar
    23. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proc. Conf. Computer Vision and Pattern Recognition (2017), 5967–5976.Google Scholar
    24. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.Google Scholar
    25. D. H. Kelly. 1984. Retinal inhomogeneity. I. Spatiotemporal contrast sensitivity. JOSA A 1, 1 (1984), 107–113.Google ScholarCross Ref
    26. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014).Google Scholar
    27. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems 25 (01 2012).Google ScholarDigital Library
    28. Debarati Kundu and Brian L. Evans. 2015. Full-reference visual quality assessment for synthetic images: A subjective study. IEEE International Conference on Image Processing (ICIP) (2015), 2374–2378.Google ScholarDigital Library
    29. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
    30. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proc. Conf. Computer Vision and Pattern Recognition. 105–114.Google Scholar
    31. S. Lee, M. Pattichis, and A. C. Bovik. 2001. Foveated Video Compression with Optimal Rate Control. IEEE Transactions on Image Processing 10, 7 (2001), 977–992.Google ScholarDigital Library
    32. Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. CoRR abs/1604.04382 (2016).Google Scholar
    33. Guilin Liu, Fitsum A. Reda, Kevin Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. arXiv preprint arXiv:1804.07723 (2018).Google Scholar
    34. Tsung-Jung Liu, Yu-Chieh Lin, Weisi Lin, and C-C Jay Kuo. 2013. Visual quality assessment: recent developments, coding applications and future trends. Transactions on Signal and Information Processing 2 (2013).Google Scholar
    35. Rafat Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions. In ACM Transactions on graphics (TOG), Vol. 30. ACM, 40.Google Scholar
    36. Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing. Computer Graphics (Proc. SIGGRAPH) 25, 4 (1991), 157–164.Google ScholarDigital Library
    37. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. CoRR abs/1802.05957 (2018).Google Scholar
    38. Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei Efros. 2016. Context Encoders: Feature Learning by Inpainting. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarCross Ref
    39. Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards Foveated Rendering for Gazetracked Virtual Reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 35, 6, Article 179 (2016), 179:1–179:12 pages.Google Scholar
    40. Eduardo Pérez-Pellitero, Mehdi S. M. Sajjadi, Michael Hirsch, and Bernhard Schölkopf. 2018. Photorealistic Video Super Resolution. CoRR abs/1807.07930 (2018).Google Scholar
    41. E. Pérez-Pellitero, M. S. M. Sajjadi, M. Hirsch, and B. Schölkopf. 2018. Photorealistic Video Super Resolution.Google Scholar
    42. Margaret H. Pinson and Stephen Wolf. 2004. A new standardized method for objectively measuring video quality. IEEE Transactions on broadcasting 50, 3 (2004), 312–322.Google ScholarCross Ref
    43. S. Rimac-Drlje, G. Martinović, and B. Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. International Conference on Systems, Signals and Image Processing (2011), 1–4.Google Scholar
    44. Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned Video Compression. (2018).Google Scholar
    45. J. G. Robson. 1966. Spatial and Temporal Contrast-Sensitivity Functions of the Visual System. JOSA A 56, 8 (Aug 1966), 1141–1142.Google ScholarCross Ref
    46. O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) (LNCS), Vol. 9351. 234–241.Google Scholar
    47. Jyrki Rovamo, Lea Leinonen, Pentti Laurinen, and Veijo Virsu. 1984. Temporal Integration and Contrast Sensitivity in Foveal and Peripheral Vision. Perception 13, 6 (1984), 665–674.Google ScholarCross Ref
    48. Daniel L Ruderman. 1994. The statistics of natural images. Network: computation in neural systems 5, 4 (1994), 517–548.Google Scholar
    49. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85 — 117.Google ScholarDigital Library
    50. K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. 2010. Study of Subjective and Objective Quality Assessment of Video. IEEE Transactions on Image Processing 19, 6 (2010), 1427–1441.Google ScholarDigital Library
    51. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014).Google Scholar
    52. Rajiv Soundararajan and Alan C. Bovik. 2013. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2013), 684–694.Google ScholarDigital Library
    53. Michael Stengel, Steve Grogorick, Martin Eisemann, and Marcus Magnor. 2016. Adaptive Image-Space Sampling for Gaze-Contingent Real-time Rendering. Computer Graphics Forum (Proc. of Eurographics Symposium on Rendering) 35, 4 (2016), 129–139.Google Scholar
    54. Qi Sun, Fu-Chung Huang, Joohwan Kim, Li-Yi Wei, David Luebke, and Arie Kaufman. 2017. Perceptually-guided Foveation for Light Field Displays. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 6, Article 192 (2017), 192:1–192:13 pages.Google ScholarDigital Library
    55. Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. Proc. ACM Symposium on Applied Perception (2016), 7–14.Google ScholarDigital Library
    56. Robert A Ulichney. 1993. Void-and-cluster method for dither array generation. In Human Vision, Visual Processing, and Digital Display IV, Vol. 1913. International Society for Optics and Photonics, 332–343.Google Scholar
    57. Alex Vlachos. 2015. Advanced VR Rendering. http://media.steampowered.com/apps/valve/2015/Alex_Vlachos_Advanced_VR_Rendering_GDC2015.pdf Game Developers Conference Talk.Google Scholar
    58. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Neural Information Processing Systems.Google Scholar
    59. Zhou Wang, Alan Conrad Bovik, Ligang Lu, and Jack L Kouloheris. 2001. Foveated wavelet image quality index. Proc. SPIE 4472 (2001), 42–53.Google Scholar
    60. Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google ScholarDigital Library
    61. Zhou Wang, Ligang Lu, and Alan C. Bovik. 2003. Foveation scalable video coding with automatic fixation selection. IEEE Transactions on Image Processing 12, 2 (2003), 243–254.Google ScholarDigital Library
    62. Martin Weier, Thorsten Roth, Ernst Kruijff, André Hinkenjann, Arsène Pérard-Gayot, Philipp Slusallek, and Yongmin Li. 2016. Foveated Real-Time Ray Tracing for Head-Mounted Displays. Computer Graphics Forum 35 (2016), 289–298.Google ScholarDigital Library
    63. M. Weier, M. Stengel, T. Roth, P. Didyk, E. Eisemann, M. Eisemann, S. Grogorick, A. Hinkenjann, E. Kruijff, M. Magnor, K. Myszkowski, and P. Slusallek. 2017. Perception-driven Accelerated Rendering. Computer Graphics Forum 36, 2 (2017), 611–643.Google ScholarDigital Library
    64. Y. Ye, E. Alshina, and J. Boyce. 2017. Algorithm descriptions of projection format conversion and video quality metrics in 360Lib. Joint Video Exploration Team of ITU-T SG 16 (2017).Google Scholar
    65. Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarCross Ref


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org