“DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos” by Kaplanyan, Sochenov, Leimkühler, Okunev, Goodall, et al. …
Conference:
Type(s):
Title:
- DeepFovea: neural reconstruction for foveated rendering and video compression using learned statistics of natural videos
Session/Category Title: Thoughts on Display
Presenter(s)/Author(s):
Moderator(s):
Abstract:
In order to provide an immersive visual experience, modern displays require head mounting, high image resolution, low latency, as well as high refresh rate. This poses a challenging computational problem. On the other hand, the human visual system can consume only a tiny fraction of this video stream due to the drastic acuity loss in the peripheral vision. Foveated rendering and compression can save computations by reducing the image quality in the peripheral vision. However, this can cause noticeable artifacts in the periphery, or, if done conservatively, would provide only modest savings. In this work, we explore a novel foveated reconstruction method that employs the recent advances in generative adversarial neural networks. We reconstruct a plausible peripheral video from a small fraction of pixels provided every frame. The reconstruction is done by finding the closest matching video to this sparse input stream of pixels on the learned manifold of natural videos. Our method is more efficient than the state-of-the-art foveated rendering, while providing the visual experience with no noticeable quality degradation. We conducted a user study to validate our reconstruction method and compare it against existing foveated rendering and video compression techniques. Our method is fast enough to drive gaze-contingent head-mounted displays in real time on modern hardware. We plan to publish the trained network to establish a new quality bar for foveated rendering and compression as well as encourage follow-up research.
References:
1. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016).Google Scholar
2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 214–223.Google ScholarDigital Library
3. Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR abs/1607.06450 (2016). arXiv:1607.06450 http://arxiv.org/abs/1607.06450Google Scholar
4. Christos Bampis, Zhi Li, Ioannis Katsavounidis, Te-Yuan Huang, Chaitanya Ekanadham, and Alan C. Bovik. 2018. Towards Perceptually Optimized End-to-end Adaptive Video Streaming. arXiv preprint arXiv:1808.03898 (2018).Google Scholar
5. Aayush Bansal, Shugao Ma, Deva Ramanan, and Yaser Sheikh. 2018. Recycle-GAN: Unsupervised Video Retargeting. In Proc. European Conference on Computer Vision.Google ScholarCross Ref
6. Chris Bradley, Jared Abrams, and Wilson S. Geisler. 2014. Retina-V1 model of detectability across the visual field. Journal of vision 14, 12 (2014), 22–22.Google ScholarCross Ref
7. Chakravarty R. Alla Chaitanya, Anton S. Kaplanyan, Christoph Schied, Marco Salvi, Aaron Lefohn, Derek Nowrouzezahrai, and Timo Aila. 2017. Interactive Reconstruction of Monte Carlo Image Sequences Using a Recurrent Denoising Autoencoder. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 4, Article 98 (2017), 98:1–98:12 pages.Google Scholar
8. Lark Kwon Choi and Alan Conrad Bovik. 2018. Video quality assessment accounting for temporal visual masking of local flicker. Signal Processing: Image Communication 67 (2018), 182 — 198.Google ScholarCross Ref
9. Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Conference on Learning Representations, ICLR abs/1511.07289 (2016).Google Scholar
10. Robert L Cook. 1986. Stochastic sampling in computer graphics. ACM Transactions on Graphics (TOG) 5, 1 (1986), 51–72.Google ScholarDigital Library
11. Christine A Curcio, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. 1990. Human photoreceptor topography. Journal of comparative neurology 292, 4 (1990), 497–523.Google ScholarCross Ref
12. Dennis M Dacey and Michael R Petersen. 1992. Dendritic field size and morphology of midget and parasol ganglion cells of the human retina. Proceedings of the National Academy of sciences 89, 20 (1992), 9666–9670.Google ScholarCross Ref
13. Wilson S. Geisler. 2008. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59 (2008), 167–192.Google ScholarCross Ref
14. Wilson S. Geisler and Jeffrey S. Perry. 1998. Real-time foveated multiresolution system for low-bandwidth video communication., 3299 – 3299 – 12 pages.Google Scholar
15. Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proc. Conf. Computer Vision and Pattern Recognition (2014), 580–587.Google ScholarDigital Library
16. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv e-prints (2014), arXiv:1406.2661. https://arxiv.org/abs/1406.2661Google Scholar
17. Brian Guenter, Mark Finch, Steven Drucker, Desney Tan, and John Snyder. 2012. Foveated 3D Graphics. ACM Trans. Graph. (Proc. SIGGRAPH) 31, 6, Article 164 (2012), 164:1–164:10 pages.Google ScholarDigital Library
18. Lars Haglund. 2006. The SVT High Definition Multi Format Test Set. (2006). https://media.xiph.org/video/derf/vqeg.its.bldrdoc.gov/HDTV/SVT_MultiFormat/SVT_MultiFormat_v10.pdfGoogle Scholar
19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (2016), 770–778.Google Scholar
20. Yong He, Yan Gu, and Kayvon Fatahalian. 2014. Extending the Graphics Pipeline with Adaptive, Multi-rate Shading. ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4, Article 142 (2014), 142:1–142:12 pages.Google Scholar
21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google Scholar
22. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. 2017. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. In Proc. Conf. Computer Vision and Pattern Recognition. http://lmb.informatik.uni-freiburg.de//Publications/2017/IMKDB17Google Scholar
23. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proc. Conf. Computer Vision and Pattern Recognition (2017), 5967–5976.Google Scholar
24. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In International Conference on Learning Representations.Google Scholar
25. D. H. Kelly. 1984. Retinal inhomogeneity. I. Spatiotemporal contrast sensitivity. JOSA A 1, 1 (1984), 107–113.Google ScholarCross Ref
26. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014).Google Scholar
27. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems 25 (01 2012).Google ScholarDigital Library
28. Debarati Kundu and Brian L. Evans. 2015. Full-reference visual quality assessment for synthetic images: A subjective study. IEEE International Conference on Image Processing (ICIP) (2015), 2374–2378.Google ScholarDigital Library
29. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
30. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proc. Conf. Computer Vision and Pattern Recognition. 105–114.Google Scholar
31. S. Lee, M. Pattichis, and A. C. Bovik. 2001. Foveated Video Compression with Optimal Rate Control. IEEE Transactions on Image Processing 10, 7 (2001), 977–992.Google ScholarDigital Library
32. Chuan Li and Michael Wand. 2016. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. CoRR abs/1604.04382 (2016).Google Scholar
33. Guilin Liu, Fitsum A. Reda, Kevin Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image Inpainting for Irregular Holes Using Partial Convolutions. arXiv preprint arXiv:1804.07723 (2018).Google Scholar
34. Tsung-Jung Liu, Yu-Chieh Lin, Weisi Lin, and C-C Jay Kuo. 2013. Visual quality assessment: recent developments, coding applications and future trends. Transactions on Signal and Information Processing 2 (2013).Google Scholar
35. Rafat Mantiuk, Kil Joong Kim, Allan G. Rempel, and Wolfgang Heidrich. 2011. HDR-VDP-2: a calibrated visual metric for visibility and quality predictions in all luminance conditions. In ACM Transactions on graphics (TOG), Vol. 30. ACM, 40.Google Scholar
36. Don P. Mitchell. 1991. Spectrally Optimal Sampling for Distribution Ray Tracing. Computer Graphics (Proc. SIGGRAPH) 25, 4 (1991), 157–164.Google ScholarDigital Library
37. Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. CoRR abs/1802.05957 (2018).Google Scholar
38. Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei Efros. 2016. Context Encoders: Feature Learning by Inpainting. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarCross Ref
39. Anjul Patney, Marco Salvi, Joohwan Kim, Anton Kaplanyan, Chris Wyman, Nir Benty, David Luebke, and Aaron Lefohn. 2016. Towards Foveated Rendering for Gazetracked Virtual Reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 35, 6, Article 179 (2016), 179:1–179:12 pages.Google Scholar
40. Eduardo Pérez-Pellitero, Mehdi S. M. Sajjadi, Michael Hirsch, and Bernhard Schölkopf. 2018. Photorealistic Video Super Resolution. CoRR abs/1807.07930 (2018).Google Scholar
41. E. Pérez-Pellitero, M. S. M. Sajjadi, M. Hirsch, and B. Schölkopf. 2018. Photorealistic Video Super Resolution.Google Scholar
42. Margaret H. Pinson and Stephen Wolf. 2004. A new standardized method for objectively measuring video quality. IEEE Transactions on broadcasting 50, 3 (2004), 312–322.Google ScholarCross Ref
43. S. Rimac-Drlje, G. Martinović, and B. Zovko-Cihlar. 2011. Foveation-based content Adaptive Structural Similarity index. International Conference on Systems, Signals and Image Processing (2011), 1–4.Google Scholar
44. Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned Video Compression. (2018).Google Scholar
45. J. G. Robson. 1966. Spatial and Temporal Contrast-Sensitivity Functions of the Visual System. JOSA A 56, 8 (Aug 1966), 1141–1142.Google ScholarCross Ref
46. O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI) (LNCS), Vol. 9351. 234–241.Google Scholar
47. Jyrki Rovamo, Lea Leinonen, Pentti Laurinen, and Veijo Virsu. 1984. Temporal Integration and Contrast Sensitivity in Foveal and Peripheral Vision. Perception 13, 6 (1984), 665–674.Google ScholarCross Ref
48. Daniel L Ruderman. 1994. The statistics of natural images. Network: computation in neural systems 5, 4 (1994), 517–548.Google Scholar
49. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85 — 117.Google ScholarDigital Library
50. K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. 2010. Study of Subjective and Objective Quality Assessment of Video. IEEE Transactions on Image Processing 19, 6 (2010), 1427–1441.Google ScholarDigital Library
51. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014).Google Scholar
52. Rajiv Soundararajan and Alan C. Bovik. 2013. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2013), 684–694.Google ScholarDigital Library
53. Michael Stengel, Steve Grogorick, Martin Eisemann, and Marcus Magnor. 2016. Adaptive Image-Space Sampling for Gaze-Contingent Real-time Rendering. Computer Graphics Forum (Proc. of Eurographics Symposium on Rendering) 35, 4 (2016), 129–139.Google Scholar
54. Qi Sun, Fu-Chung Huang, Joohwan Kim, Li-Yi Wei, David Luebke, and Arie Kaufman. 2017. Perceptually-guided Foveation for Light Field Displays. ACM Trans. Graph. (Proc. SIGGRAPH) 36, 6, Article 192 (2017), 192:1–192:13 pages.Google ScholarDigital Library
55. Nicholas T. Swafford, José A. Iglesias-Guitian, Charalampos Koniaris, Bochang Moon, Darren Cosker, and Kenny Mitchell. 2016. User, metric, and computational evaluation of foveated rendering methods. Proc. ACM Symposium on Applied Perception (2016), 7–14.Google ScholarDigital Library
56. Robert A Ulichney. 1993. Void-and-cluster method for dither array generation. In Human Vision, Visual Processing, and Digital Display IV, Vol. 1913. International Society for Optics and Photonics, 332–343.Google Scholar
57. Alex Vlachos. 2015. Advanced VR Rendering. http://media.steampowered.com/apps/valve/2015/Alex_Vlachos_Advanced_VR_Rendering_GDC2015.pdf Game Developers Conference Talk.Google Scholar
58. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Neural Information Processing Systems.Google Scholar
59. Zhou Wang, Alan Conrad Bovik, Ligang Lu, and Jack L Kouloheris. 2001. Foveated wavelet image quality index. Proc. SPIE 4472 (2001), 42–53.Google Scholar
60. Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.Google ScholarDigital Library
61. Zhou Wang, Ligang Lu, and Alan C. Bovik. 2003. Foveation scalable video coding with automatic fixation selection. IEEE Transactions on Image Processing 12, 2 (2003), 243–254.Google ScholarDigital Library
62. Martin Weier, Thorsten Roth, Ernst Kruijff, André Hinkenjann, Arsène Pérard-Gayot, Philipp Slusallek, and Yongmin Li. 2016. Foveated Real-Time Ray Tracing for Head-Mounted Displays. Computer Graphics Forum 35 (2016), 289–298.Google ScholarDigital Library
63. M. Weier, M. Stengel, T. Roth, P. Didyk, E. Eisemann, M. Eisemann, S. Grogorick, A. Hinkenjann, E. Kruijff, M. Magnor, K. Myszkowski, and P. Slusallek. 2017. Perception-driven Accelerated Rendering. Computer Graphics Forum 36, 2 (2017), 611–643.Google ScholarDigital Library
64. Y. Ye, E. Alshina, and J. Boyce. 2017. Algorithm descriptions of projection format conversion and video quality metrics in 360Lib. Joint Video Exploration Team of ITU-T SG 16 (2017).Google Scholar
65. Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proc. Conf. Computer Vision and Pattern Recognition.Google ScholarCross Ref


