“3DTV at home: eulerian-lagrangian stereo-to-multiview conversion” by Kellnhofer, Didyk, Wang, Sitthi-amorn, Freeman, et al. …

  • ©Petr Kellnhofer, Piotr Didyk, Szu-Po Wang, Pitchaya Sitthi-amorn, William T. Freeman, Frédo Durand, and Wojciech Matusik




    3DTV at home: eulerian-lagrangian stereo-to-multiview conversion


Session Title: Computational Cameras & Displays



    Stereoscopic 3D (S3D) movies have become widely popular in the movie theaters, but the adoption of S3D at home is low even though most TV sets support S3D. It is widely believed that S3D with glasses is not the right approach for the home. A much more appealing approach is to use automulti-scopic displays that provide a glasses-free 3D experience to multiple viewers. A technical challenge is the lack of native multiview content that is required to deliver a proper view of the scene for every viewpoint. Our approach takes advantage of the abundance of stereoscopic 3D movies. We propose a real-time system that can convert stereoscopic video to a high-quality multiview video that can be directly fed to automultiscopic displays. Our algorithm uses a wavelet-based decomposition of stereoscopic images with per-wavelet disparity estimation. A key to our solution lies in combining Lagrangian and Eulerian approaches for both the disparity estimation and novel view synthesis, which leverages the complementary advantages of both techniques. The solution preserves all the features of Eulerian methods, e.g., subpixel accuracy, high performance, robustness to ambiguous depth cases, and easy integration of inter-view aliasing while maintaining the advantages of Lagrangian approaches, e.g., robustness to large disparities and possibility of performing non-trivial disparity manipulations through both view extrapolation and interpolation. The method achieves real-time performance on current GPUs. Its design also enables an easy hardware implementation that is demonstrated using a field-programmable gate array. We analyze the visual quality and robustness of our technique on a number of synthetic and real-world examples. We also perform a user experiment which demonstrates benefits of the technique when compared to existing solutions.


    1. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual reality video. ACM Trans. Graph. 35, 6, Article 198 (Nov. 2016), 13 pages. Google ScholarDigital Library
    2. Myron Z Brown, Darius Burschka, and Gregory D Hager. 2003. Advances in computational stereo. IEEE Trans. on Pattern Analysis and Machine Intelligence 25, 8 (2003), 993–1008.Google ScholarDigital Library
    3. Alexandre Chapiro, Simon Heinzle, Tunç Ozan Aydin, Steven Poulakos, Matthias Zwicker, Aljosa Smolic, and Markus Gross. 2014. Optimizing stereo-to-multiview conversion for autostereoscopic displays. In Computer Graphics Forum, Vol. 33. Wiley Online Library, 63–72. Google ScholarDigital Library
    4. Chris Chinnock. 2012. Trends in the 3D TV market. In Handbook of Visual Display Technology. Springer, 2599–2606. Google ScholarCross Ref
    5. Piotr Didyk, Tobias Ritschel, Elmar Eisemann, Karol Myszkowski, Hans-Peter Seidel, and Wojciech Matusik. 2012. A luminance-contrast-aware disparity model and applications. ACM Trans. Graph. 31, 6 (2012), 184. Google ScholarDigital Library
    6. Piotr Didyk, Pitchaya Sitthi-Amorn, William Freeman, Frédo Durand, and Wojciech Matusik. 2013. Joint view expansion and filtering for automultiscopic 3D displays. ACM Trans. Graph. 32, 6 (2013), 221. Google ScholarDigital Library
    7. Song-Pei Du, Piotr Didyk, Frédo Durand, Shi-Min Hu, and Wojciech Matusik. 2014. Improving visual quality of view transitions in automultiscopic displays. ACM Trans. Graph. 33, 6 (2014), 192:1–192:9.Google ScholarDigital Library
    8. Ye Fan, Joshua Litven, David IW Levin, and Dinesh K Pai. 2013. Eulerian-on-Lagrangian simulation. ACM Trans. Graph. 32, 3 (2013), 22:1–22:9.Google ScholarDigital Library
    9. Miquel Farre, Oliver Wang, Manuel Lang, Nikolce Stefanoski, Alexander Hornung, and Aljoscha Smolic. 2011. Automatic content creation for multiview autostereoscopic displays using image domain warping. In IEEE International Conference on Multimedia and Expo. Google ScholarDigital Library
    10. David J Fleet and Allan D Jepson. 1990. Computation of component image velocity from local phase information. International Journal of Computer Vision 5, 1 (1990), 77–104. Google ScholarDigital Library
    11. David J Fleet, Allan D Jepson, and Michael RM Jenkin. 1991. Phase-based disparity measurement. CVGIP: Image Understanding 53, 2 (1991), 198–210.Google ScholarDigital Library
    12. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2015. DeepStereo: Learning to predict new views from the world’s imagery. arXiv preprint arXiv:1506.06825 (2015).Google Scholar
    13. Andrea Fusiello, Emanuele Trucco, and Alessandro Verri. 2000. A compact algorithm for rectification of stereo pairs. Machine Vision and Applications 12, 1 (2000), 16–22. Google ScholarDigital Library
    14. Samuel W Hasinoff, Sing Bing Kang, and Richard Szeliski. 2006. Boundary matting for view synthesis. Computer Vision and Image Understanding 103, 1 (2006), 22–32.Google ScholarDigital Library
    15. Heiko Hirschmuller and Daniel Scharstein. 2007. Evaluation of cost functions for stereo matching. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 1–8.Google ScholarCross Ref
    16. Asmaa Hosni, Christoph Rhemann, Michael Bleyer, Carsten Rother, and Margrit Gelautz. 2013. Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. on Pattern Analysis and Machine Intelligence 35, 2 (2013), 504–511. Google ScholarDigital Library
    17. Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Trans. Graph. (Proc. of SIGGRAPH Asia 2016) 35, 6 (2016).Google Scholar
    18. Johannes Kopf, Fabian Langguth, Daniel Scharstein, Richard Szeliski, and Michael Goesele. 2013. Image-based rendering in the gradient domain. ACM Trans. Graph. 32, 6 (2013), 199. Google ScholarDigital Library
    19. Manuel Lang, Alexander Hornung, Oliver Wang, Steven Poulakos, Aljoscha Smolic, and Markus Gross. 2010. Nonlinear disparity mapping for stereoscopic 3D. ACM Trans. Graph. 29, 4 (2010), 75:1–75:10.Google ScholarDigital Library
    20. Chao-Kang Liao, Hsiu-Chi Yeh, Ke Zhang, Vanmeerbeeck Geert, Tian-Sheuan Chang, and Gauthier Lafruit. 2013. Stereo matching and viewpoint synthesis FPGA implementation. In 3D-TV System with Depth-Image-Based Rendering. Springer, 69–106.Google Scholar
    21. QH Liu and N Nguyen. 1998. An accurate algorithm for nonuniform fast Fourier transforms (NUFFT’s). IEEE Microwave and Guided Wave Letters 8, 1 (1998), 18–20. Google ScholarCross Ref
    22. Lytro Inc. 2015. (January 2015). https://www.lytro.com/.Google Scholar
    23. William R Mark, Leonard McMillan, and Gary Bishop. 1997. Post-rendering 3D warping. In Proc. of the 1997 Symposium on Interactive 3D Graphics. ACM, 7-ff.Google ScholarDigital Library
    24. Belen Masia, Gordon Wetzstein, Carlos Aliaga, Ramesh Raskar, and Diego Gutierrez. 2013. Display adaptive 3D content remapping. Computers & Graphics, Special Issue on Advanced Displays 37, 6 (2013), 983–996.Google Scholar
    25. Takuya Matsuo, Norishige Fukushima, and Yutaka Ishibashi. 2013. Weighted joint bilateral filter with slope depth compensation filter for depth map refinement. In VISAPP (2). 300–309.Google Scholar
    26. Wojciech Matusik and Hanspeter Pfister. 2004. 3D TV: A scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. ACM Trans. Graph. 23, 3 (2004), 814–824. Google ScholarDigital Library
    27. Lydia MJ Meesters, Wijnand A IJsselsteijn, and Piter JH Seuntiens. 2004. A survey of perceptual evaluations and requirements of three-dimensional TV. IEEE Trans. on Circuits and Systems for Video Technology 14, 3 (2004), 381–391.Google ScholarDigital Library
    28. H Keith Nishihara. 1984. Practical real-time imaging stereo matcher. Optical Engineering 23, 5 (1984), 235536–235536. Google ScholarCross Ref
    29. Karl Pauwels and Marc M Van Hulle. 2008. Realtime phase-based optical flow on the GPU. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on. IEEE, 1–8.Google ScholarCross Ref
    30. Raytrix GmbH. 2015. (January 2015). http://www.raytrix.de/.Google Scholar
    31. Christian Richardt, Carsten Stoll, Neil A Dodgson, Hans-Peter Seidel, and Christian Theobalt. 2012. Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. In Computer Graphics Forum, Vol. 31. Wiley Online Library, 247–256. Google ScholarDigital Library
    32. Christian Riechert, Frederik Zilly, Peter Kauff, Jens Güther, and Ralf Schäfer. 2012. Fully automatic stereo-to-multiview conversion in autostereoscopic displays. The Best of IET and IBC 4, 8 (2012), 14.Google Scholar
    33. Michael Schaffner, Frank Gurkaynak, Pierre Greisen, Hubert Kaeslin, Luca Benini, and Aljosa Smolic. 2015. Hybrid ASIC/FPGA system for fully automatic stereo-to-multiview conversion using IDW. Circuits and Systems for Video Technology, IEEE Trans. on (2015).Google Scholar
    34. T. Shibata, J. Kim, D.M. Hoffman, and M.S. Banks. 2011. The zone of comfort: Predicting visual discomfort with stereo displays. Journal of Vision 11, 8 (2011), 11:1–11:29.Google ScholarCross Ref
    35. Eero P Simoncelli and William T Freeman. 1995. The steerable pyramid: A flexible architecture for multi-scale derivative computation. In Image Processing, International Conference on, Vol. 3. IEEE Computer Society, 3444–3444.Google Scholar
    36. Eero P Simoncelli, William T Freeman, Edward H Adelson, and David J Heeger. 1992. Shiftable multiscale transforms. IEEE Trans. on Information Theory 38, 2 (1992), 587–607.Google ScholarDigital Library
    37. Sudipta N Sinha, Drew Steedly, and Richard Szeliski. 2009. Piecewise planar stereo for image-based rendering.. In ICCV. 1881–1888.Google Scholar
    38. Aljoscha Smolic, Karsten Muller, Kristina Dix, Philipp Merkle, Peter Kauff, and Thomas Wiegand. 2008. Intermediate view interpolation based on multiview video plus depth for advanced 3D video systems. In IEEE International Conference on Image Processing. 2448–2451. Google ScholarCross Ref
    39. Nikolce Stefanoski, Oliver Wang, Michael Lang, Pierre Greisen, Simon Heinzle, and Aljoscha Smolic. 2013. Automatic view synthesis by image-domain-warping. Image Processing, IEEE Trans. on 22, 9 (2013), 3329–3341.Google ScholarDigital Library
    40. Richard Szeliski, Shai Avidan, and P Anandan. 2000. Layer extraction from multiple images containing reflections and transparency. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, Vol. 1. IEEE, 246–253. Google ScholarCross Ref
    41. Neal Wadhwa, Michael Rubinstein, Frédo Durand, and William T Freeman. 2013. Phase-based video motion processing. ACM Trans. Graph. 32, 4 (2013), 80:1–80:10.Google ScholarDigital Library
    42. Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. Image Processing, IEEE Trans. on 13, 4 (2004), 600–612.Google ScholarDigital Library
    43. Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez, Adam Barth, Andrew Adams, Mark Horowitz, and Marc Levoy. 2005. High performance imaging using large camera arrays. In ACM Trans. Graph., Vol. 24. ACM, 765–776. Google ScholarDigital Library
    44. Bennett S Wilburn, Michal Smulski, Hsiao-Heng K Lee, and Mark A Horowitz. 2001. Light field video camera. In Electronic Imaging 2002. International Society for Optics and Photonics, 29–36.Google Scholar
    45. Hao-Yu Wu, Michael Rubinstein, Eugene Shih, John V Guttag, Frédo Durand, and William T Freeman. 2012. Eulerian video magnification for revealing subtle changes in the world. (2012).Google Scholar
    46. Zhoutong Zhang, Yebin Liu, and Qionghai Dai. 2015. Light field from micro-baseline image pair. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition. 3800–3809. Google ScholarCross Ref
    47. Jun Zhou, Yi Xu, and Xiaokang Yang. 2007. Quaternion wavelet phase based stereo matching for uncalibrated images. Pattern Recognition Letters 28, 12 (2007), 1509–1522. Google ScholarDigital Library
    48. C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM Trans. Graph., Vol. 23. ACM, 600–608.Google ScholarDigital Library
    49. Matthias Zwicker, Wojciech Matusik, Frédo Durand, and Hanspeter Pfister. 2006. Antialiasing for automultiscopic 3D displays. In Proc. of the 17th Eurographics Conference on Rendering Techniques. Eurographics Association, 73–82. Google ScholarDigital Library

ACM Digital Library Publication: