Motion parallax for 360◦ RGBD video

We present a method for adding parallax and real-time playback of 360◦ videos in Virtual Reality headsets. In current video players, the playback does not respond to translational head movement, which reduces the feeling of immersion, and causes motion sickness for some viewers. Given a 360◦ video and its corresponding depth (provided by current stereo 360◦ stitching algorithms), a naive image-based rendering approach would use the depth to generate a 3D mesh around the viewer, then translate it appropriately as the viewer moves their head. However, this approach breaks at depth discontinuities, showing visible distortions, whereas cutting the mesh at such discontinuities leads to ragged silhouettes and holes at disocclusions. We address these issues by improving the given initial depth map to yield cleaner, more natural silhouettes. We rely on a three-layer scene representation, made up of a foreground layer and two static background layers, to handle disocclusions by propagating information from multiple frames for the first background layer, and then inpainting for the second one. Our system works with input from many of today’s most popular 360◦ stereo capture devices (e.g., Yi Halo or GoPro Odyssey), and works well even if the original video does not provide depth information. Our user studies confirm that our method provides a more compelling viewing experience than without parallax, increasing immersion while reducing discomfort and nausea.

References:

[1] R. Anderson, D. Gallup, J. T. Barron, J. Kontkanen, N. Snavely, C. Hernandez, S. Agarwal, and S. M. Seitz. Jump: Virtual Reality Video. ? ACM Trans. Graph., 2016.

[2] A. Atapour-Abarghouei and T. P. Breckon. A comparative review of plausible hole filling strategies in the context of scene depth image completion. Computers & Graphics, 72:39?58, 2018.

[3] D. Baricevic, T. Hollerer, and M. Turk. Densification of semi-dense ? reconstructions for novel view generation of live scenes. In WACV, pp. 842?851. IEEE, 2017.

[4] J. T. Barron, A. Adams, Y. Shih, and C. Hernandez. Fast bilateral-space ? stereo for synthetic defocus. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4466?4474, 2015.

[5] M. Bertalm??o, G. Sapiro, V. Caselles, and C. Ballester. Image inpainting. In SIGGRAPH, pp. 417?424. ACM, 2000.

[6] S. Birchfield and C. Tomasi. Depth discontinuities by pixel-to-pixel stereo. International Journal of Computer Vision, 35(3):269?293, 1999.

[7] M. J. Black, G. Sapiro, D. H. Marimont, and D. Heeger. Robust anisotropic diffusion. IEEE Trans. Image Processing, 7(3):421?432, 1998.

[8] C. Buehler, M. Bosse, L. McMillan, S. J. Gortler, and M. F. Cohen. Unstructured lumigraph rendering. In SIGGRAPH, pp. 425?432. ACM, 2001.

[9] P. Buyssens, M. Daisy, D. Tschumperle, and O. L ? ezoray. Depth-aware ? patch-based image disocclusion for virtual view synthesis. In SIGGRAPH Asia Technical Briefs, pp. 2:1?2:4. ACM, 2015.

[10] P. Buyssens, O. L. Meur, M. Daisy, D. Tschumperle, and O. L ? ezoray. ? Depth-guided disocclusion inpainting of synthesized RGB-D images. IEEE Trans. Image Processing, 26(2):525?538, 2017.

[11] Z. Bylinskii, P. Isola, C. Bainbridge, A. Torralba, and A. Oliva. Intrinsic and extrinsic effects on image memorability. Vision research, 116:165? 178, 2015.

[12] T. Chang, Y. Chou, and J. Yang. Robust depth enhancement and optimization based on advanced multilateral filters. EURASIP J. Adv. Sig. Proc., 2017:51, 2017.

[13] G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis. Depth- synthesis and local warps for plausible image-based navigation. ACM Trans. Graph., 32(3):30:1?30:12, 2013.

[14] G. Chaurasia, O. Sorkine, and G. Drettakis. Silhouette-aware warping for image-based rendering. Comput. Graph. Forum, 30(4):1223?1232, 2011.

[15] J. Chen, S. Paris, and F. Durand. Real-time edge-aware image processing with the bilateral grid. ACM Trans. Graph., 26(3), July 2007. doi: 10. 1145/1276377.1276506

[16] A. Criminisi, P. Perez, and K. Toyama. Region filling and object removal ? by exemplar-based image inpainting. IEEE Trans. Image Processing, 13(9):1200?1212, 2004.

[17] F. Danieau, A. Guillo, and R. Dore. Attention guidance for immersive ? video content in head-mounted displays. In 2017 IEEE Virtual Reality (VR), pp. 205?206, 2017.

[18] P. Debevec, G. Borshukov, and Y. Yu. Efficient view-dependent imagebased rendering with projective texture-mapping. In 9th Eurographics Rendering Workshop, 1998.

[19] P. E. Debevec, C. J. Taylor, and J. Malik. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In SIGGRAPH, pp. 11?20. ACM, 1996.

[20] F. Devernay and A. R. Peon. Novel view synthesis for stereoscopic cinema: Detecting and removing artifacts. In Proceedings of the 1st International Workshop on 3D Video Processing, 3DVP ?10, pp. 25?30. ACM, New York, NY, USA, 2010. doi: 10.1145/1877791.1877798

[21] P. Didyk, P. Sitthi-Amorn, W. Freeman, F. Durand, and W. Matusik. Joint view expansion and filtering for automultiscopic 3d displays. ACM Transactions on Graphics (TOG), 32(6):221, 2013.

[22] P. Dollar and C. L. Zitnick. Structured forests for fast edge detection. In ? ICCV, pp. 1841?1848. IEEE Computer Society, 2013.

[23] M. Eisemann, B. de Decker, M. A. Magnor, P. Bekaert, E. de Aguiar, N. Ahmed, C. Theobalt, and A. Sellent. Floating textures. Comput. Graph. Forum, 27(2):409?418, 2008.

[24] U. Engine. Virtual reality best practices. https://docs.unrealengine. com/en-us/Platforms/VR/ContentSetup.

[25] R. S. Feris, R. Raskar, L. Chen, K. Tan, and M. Turk. Discontinuity preserving stereo with small baseline multi-flash illumination. In ICCV, pp. 412?419. IEEE Computer Society, 2005.

[26] J. Flynn, I. Neulander, J. Philbin, and N. Snavely. Deepstereo: Learning to predict new views from the world?s imagery. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[27] C. Godard, O. Mac Aodha, and G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In CVPR, pp. 6602?6611. IEEE Computer Society, 2017.

[28] M. Goesele, J. Ackermann, S. Fuhrmann, C. Haubold, R. Klowsky, D. Steedly, and R. Szeliski. Ambient point clouds for view interpolation. ACM Trans. Graph., 29(4):95:1?95:6, 2010.

[29] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen. The lumigraph. In SIGGRAPH, pp. 43?54. ACM, 1996.

[30] S. Grogorick, M. Stengel, E. Eisemann, and M. Magnor. Subtle gaze guidance for immersive environments. In Proceedings of the ACM Symposium on Applied Perception, SAP ?17, pp. 4:1?4:7, 2017.

[31] K. He, J. Sun, and X. Tang. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell., 35(6):1397?1409, 2013.

[32] P. Hedman, S. Alsisan, R. Szeliski, and J. Kopf. Casual 3d photography. ACM Trans. Graph., 36(6):234:1?234:15, 2017.

[33] P. Hedman and J. Kopf. Instant 3d photography. ACM Trans. Graph., 37(4):101:1?101:12, July 2018.

[34] A. T. Hinds, D. Doyen, and P. Carballeira. Toward the realization of six degrees-of-freedom with compressed light fields. In ICME, pp. 1171?1176. IEEE Computer Society, 2017.

[35] J. Huang, Z. Chen, D. Ceylan, and H. Jin. 6-dof VR videos with a single 360-camera. In VR, pp. 37?44. IEEE Computer Society, 2017.

[36] Intel and H. Technology. Hypevr. https://hypevr.com/, 2018. Last accessed June 1st, 2018.

[37] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), 2009.

[38] N. K. Kalantari, T. Wang, and R. Ramamoorthi. Learning-based view synthesis for light field cameras. ACM Trans. Graph., 35(6):193:1?193:10, 2016.

[39] P. Kellnhofer, P. Didyk, S.-P. Wang, P. Sitthi-Amorn, W. Freeman, F. Durand, and W. Matusik. 3dtv at home: Eulerian-lagrangian stereo-tomultiview conversion. ACM Trans. Graph., 36(4):146:1?146:13, July 2017.

[40] H. K. Kim, J. Park, Y. Choi, and M. Choe. Virtual reality sickness questionnaire (vrsq): Motion sickness measurement index in a virtual reality environment. Applied Ergonomics, 69:66?73, 2018.

[41] B. Koniaris, M. Kosek, D. Sinclair, and K. Mitchell. Real-time rendering with compressed animated light fields. In Graphics Interface, pp. 33?40. Canadian Human-Computer Communications Society / ACM, 2017.

[42] J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele. Joint bilateral upsampling. ACM Trans. Graph., 26(3):96, 2007.

[43] J. H. Lee, I. Choi, and M. H. Kim. Laplacian patch-based image synthesis. In CVPR, pp. 2727?2735. IEEE Computer Society, 2016.

[44] A. Levin, D. Lischinski, and Y. Weiss. Colorization using optimization. ACM Trans. Graph., 23(3):689?694, 2004.

[45] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell., 30(2):228?242, 2008.

[46] C. Lipski, C. Linz, K. Berger, A. Sellent, and M. Magnor. Virtual video camera: Image-based viewpoint navigation through space and time. Computer Graphics Forum, 29(8):2555?2568, Dec 2010.

[47] C. Liu. Exploring new representations and applications for motion analysis. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2009.

[48] B. Luo, F. Xu, C. Richardt, and J. Yong. Parallax360: Stereoscopic 360? scene representation for head-motion parallax. IEEE Trans. Vis. Comput. Graph., 24(4):1545?1553, 2018.

[49] D. Mahajan, F.-C. Huang, W. Matusik, R. Ramamoorthi, and P. Belhumeur. Moving gradients: A path-based method for plausible image interpolation. ACM Trans. Graph., 28(3):42:1?42:11, July 2009.

[50] L. McMillan and G. Bishop. Plenoptic modeling: An image-based rendering system. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ?95, pp. 39?46, 1995.

[51] Oculus. Oculus introduction to best practices. https://developer. oculus.com/design/latest/concepts/book-bp/, 2017. Last accessed September 10th, 2018.

[52] R. S. Overbeck, D. Erickson, D. Evangelakos, M. Pharr, and P. Debevec. A system for acquiring, compressing, and rendering panoramic light field stills for virtual reality. ACM Trans. Graph., 37, 2018.

[53] E. Penner and L. Zhang. Soft 3d reconstruction for view synthesis. ACM Trans. Graph., 36(6):235:1?235:11, 2017.

[54] J. Philip and G. Drettakis. Plane-based multi-view inpainting for imagebased rendering in large scenes. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D ?18, pp. 6:1?6:11, 2018.

[55] C. Richardt, Y. Pritch, H. Zimmer, and A. Sorkine-Hornung. Megastereo: Constructing high-resolution stereo panoramas. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1256?1263, June 2013.

[56] T. Rongsirigul, Y. Nakashima, T. Sato, and N. Yokoya. Novel view synthesis with light-weight view-dependent texture mapping for a stereoscopic HMD. In ICME, pp. 703?708. IEEE Computer Society, 2017.

[57] E. Sayyad, P. Sen, and T. Hollerer. Panotrace: interactive 3d modeling ? of surround-view panoramic images in virtual reality. In VRST, pp. 32:1? 32:10. ACM, 2017.

[58] C. Schroers, J.-C. Bazin, and A. Sorkine-Hornung. An omnistereoscopic video pipeline for capture and display of real-world vr. ACM Trans. Graph., 37(3):37:1?37:13, Aug. 2018.

[59] S. M. Seitz and C. R. Dyer. View morphing. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ?96, pp. 21?30, 1996.

[60] A. Serrano, V. Sitzmann, J. Ruiz-Borau, G. Wetzstein, D. Gutierrez, and B. Masia. Movie editing and cognitive event segmentation in virtual reality video. ACM Transactions on Graphics (SIGGRAPH 2017), 36(4), 2017.

[61] Q. Shan, B. Curless, Y. Furukawa, C. Hernandez, and S. M. Seitz. Oc- ? cluding contours for multi-view stereo. In CVPR, pp. 4002?4009. IEEE Computer Society, 2014.

[62] V. Sitzmann, A. Serrano, A. Pavel, M. Agrawala, D. Gutierrez, B. Masia, and G. Wetzstein. How do people explore virtual environments? IEEE Transactions on Visualization and Computer Graphics, 2017.

[63] N. Stefanoski, O. Wang, M. Lang, P. Greisen, S. Heinzle, and A. Smolic. Automatic view synthesis by image-domain-warping. IEEE Trans. Image Processing, 22(9):3329?3341, 2013.

[64] J. Studios. The cinematic vr field guide – a guide to best practices for shooting 360. https://www.jauntvr.com/cdn/uploads/ jaunt-vr-field-guide.pdf.

[65] U. Technologies. Movement in vr. https://unity3d.com/es/learn/ tutorials/topics/virtual-reality/movement-vr.

[66] J. Thatte, J. Boin, H. Lakshman, and B. Girod. Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In ICME, pp. 1?6. IEEE Computer Society, 2016.

[67] J. Thatte and B. Girod. Towards perceptual evaluation of six degrees of freedom virtual reality rendering from stacked omnistereo representation. Electronic Imaging, 2018(5):352?1?352?6, 2018.

[68] M. Waechter, M. Beljan, S. Fuhrmann, N. Moehrle, J. Kopf, and M. Goesele. Virtual rephotography: Novel view prediction error for 3d reconstruction. ACM Trans. Graph., 36(4), Jan. 2017.

[69] T.-C. Wang, J.-Y. Zhu, N. K. Kalantari, A. A. Efros, and R. Ramamoorthi. Light field video capture using a learning-based hybrid imaging system. ACM Trans. Graph., 36(4):133:1?133:13, July 2017.

[70] T. Zhou, R. Tucker, J. Flynn, G. Fyffe, and N. Snavely. Stereo magnification: Learning view synthesis using multiplane images. ACM Trans. Graph., 37(4):65:1?65:12, July 2018.

[71] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. A. J. Winder, and R. Szeliski. High-quality video view interpolation using a layered representation. ACM Trans. Graph., 23(3):600?608, 2004.

ACM Digital Library Publication:

Motion parallax for 360◦ RGBD video

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Motion parallax for 360◦ RGBD video” by SerranoIncheol, Chen, DiVerdi, Gutierrez, Hertzmann, et al. …

Conference:

Title:

Session/Category Title:

Presenter(s)/Author(s):

Interest Area:

Abstract:

References:

ACM Digital Library Publication:

Submit a story:

Sponsored by: