“High-quality video view interpolation using a layered representation” by Zitnick, Kang, Uyttendaele, Winder and Szeliski

  • ©Lawrence C. Zitnick, Sing Bing Kang, Matt Uyttendaele, Simon Winder, and Richard Szeliski




    High-quality video view interpolation using a layered representation



    The ability to interactively control viewpoint while watching a video is an exciting application of image-based rendering. The goal of our work is to render dynamic scenes with interactive viewpoint control using a relatively small number of video cameras. In this paper, we show how high-quality video-based rendering of dynamic scenes can be accomplished using multiple synchronized video streams combined with novel image-based modeling and rendering algorithms. Once these video streams have been processed, we can synthesize any intermediate view between cameras at any time, with the potential for space-time manipulation.In our approach, we first use a novel color segmentation-based stereo algorithm to generate high-quality photoconsistent correspondences across all camera views. Mattes for areas near depth discontinuities are then automatically extracted to reduce artifacts during view synthesis. Finally, a novel temporal two-layer compressed representation that handles matting is developed for rendering at interactive rates.


    1. Baker, S., Szeliski, R., and Anandan, P. 1998. A layered approach to stereo reconstruction. In Conference on Computer Vision and Pattern Recognition (CVPR), 434–441. Google ScholarDigital Library
    2. Buehler, C., Bosse, M., McMillan, L., Gortler, S. J., and Cohen, M. F. 2001. Unstructured lumigraph rendering. Proceedings of SIGGRAPH 2001, 425–432. Google ScholarDigital Library
    3. Carceroni, R. L., and Kutulakos, K. N. 2001. Multi-view scene capture by surfel sampling: From video streams to non-rigid 3D motion, shape and reflectance. In International Conference on Computer Vision (ICCV), vol. II, 60–67.Google Scholar
    4. Carranza, J., Theobalt, C., Magnor, M. A., and Seidel, H.-P. 2003. Free-viewpoint video of human actors. ACM Transactions on Graphics 22, 3, 569–577. Google ScholarDigital Library
    5. Chang, C.-L., et al. 2003. Inter-view wavelet compression of light fields with disparity-compensated lifting. In Visual Communication and Image Processing (VCIP 2003), 14–22.Google Scholar
    6. Chuang, Y.-Y., et al. 2001. A Bayesian approach to digital matting. In Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, 264–271.Google Scholar
    7. Chuang, Y.-Y., et al. 2002. Video matting of complex scenes. ACM Transactions on Graphics 21, 3, 243–248. Google ScholarDigital Library
    8. Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. Computer Graphics (SIGGRAPH’96), 11–20. Google ScholarDigital Library
    9. Debevec, P. E., Yu, Y., and Borshukov, G. D. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. Eurographics Rendering Workshop 1998, 105–116.Google Scholar
    10. Fitzgibbon, A., Wexler, Y., and Zisserman, A. 2003. Image-based rendering using image-based priors. In International Conference on Computer Vision (ICCV), vol. 2, 1176–1183. Google ScholarDigital Library
    11. Goldlücke, B., Magnor, M., and Wilburn, B. 2002. Hardware-accelerated dynamic light field rendering. In Proceedings Vision, Modeling and Visualization VMV 2002, 455–462.Google Scholar
    12. Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. 1996. The Lumigraph. In Computer Graphics (SIGGRAPH’96) Proceedings, ACM SIGGRAPH, 43–54. Google ScholarDigital Library
    13. Gross, M., et al. 2003. blue-c: A spatially immersive display and 3D video portal for telepresence. Proceedings of SIGGRAPH 2003 (ACM Transactions on Graphics), 819–827. Google ScholarDigital Library
    14. Hall-Holt, O., and Rusinkiewicz, S. 2001. Stripe boundary codes for real-time structured-light range scanning of moving objects. In International Conference on Computer Vision (ICCV), vol. II, 359–366.Google Scholar
    15. Heigl, B., et al. 1999. Plenoptic modeling and rendering from image sequences taken by hand-held camera. In DAGM’99, 94–101. Google ScholarDigital Library
    16. Kanade, T., Rander, P. W., and Narayanan, P. J. 1997. Virtualized reality: constructing virtual worlds from real scenes. IEEE MultiMedia Magazine, 1(1):34–47. Google ScholarDigital Library
    17. Levoy, M., and Hanrahan, P. 1996. Light field rendering. In Computer Graphics (SIGGRAPH’96) Proceedings, ACM SIGGRAPH, 31–42. Google ScholarDigital Library
    18. Matusik, W., et al. 2000. Image-based visual hulls. Proceedings of SIGGRAPH 2000, 369–374. Google ScholarDigital Library
    19. Patras, I., Hendriks, E., and Lagendijk, R. 2001. Video segmentation by MAP labeling of watershed segments. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3, 326–332. Google ScholarDigital Library
    20. Perona, P., and Malik, J. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 7, 626–639. Google ScholarDigital Library
    21. Pulli, K., et al. 1997. View-based rendering: Visualizing real objects from scanned range and color data. In Proceedings of the 8th Eurographics Workshop on Rendering, 23–34. Google ScholarDigital Library
    22. Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 1, 7–42. Google ScholarDigital Library
    23. Schirmacher, H., Ming, L., and Seidel, H.-P. 2001. On-the-fly processing of generalized Lumigraphs. In Proceedings of Eurographics, Computer Graphics Forum 20, 3, 165–173.Google ScholarCross Ref
    24. Seitz, S. M., and Dyer, C. M. 1997. Photorealistic scene reconstrcution by voxel coloring. In Conference on Computer Vision and Pattern Recognition (CVPR), 1067–1073. Google ScholarDigital Library
    25. Shade, J., Gortler, S., He, L.-W., and Szeliski, R. 1998. Layered depth images. In Computer Graphics (SIGGRAPH’98) Proceedings, ACM SIGGRAPH, 231–242. Google ScholarDigital Library
    26. Smolic, A., and Kimata, H. 2003. AHG on 3DAV Coding. ISO/IEC JTC1/SC29/WG11 MPEG03/M9635.Google Scholar
    27. Szeliski, R., and Golland, P. 1999. Stereo matching with transparency and matting. International Journal of Computer Vision 32, 1, 45–61. Google ScholarDigital Library
    28. Tao, H., Sawhney, H., and Kumar, R. 2001. A global matching framework for stereo computation. In International Conference on Computer Vision (ICCV), vol. I, 532–539.Google Scholar
    29. Tsin, Y., Kang, S. B., and Szeliski, R. 2003. Stereo matching with reflections and translucency. In Conference on Computer Vision and Pattern Recognition (CVPR), vol. I, 702–709. Google ScholarDigital Library
    30. Vedula, S., Baker, S., Seitz, S., and Kanade, T. 2000. Shape and motion carving in 6D. In Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, 592–598.Google Scholar
    31. Wang, J. Y. A., and Adelson, E. H. 1993. Layered representation for motion analysis. In Conference on Computer Vision and Pattern Recognition (CVPR), 361–366.Google Scholar
    32. Wexler, Y., Fitzgibbon, A., and Zisserman, A. 2002. Bayesian estimation of layers from multiple images. In Seventh European Conference on Computer Vision (ECCV), vol. III, 487–501. Google ScholarDigital Library
    33. Wilburn, B., Smulski, M., Lee, H. H. K., and Horowitz, M. 2002. The light field video camera. In SPIE Electonic Imaging: Media Processors, vol. 4674, 29–36.Google Scholar
    34. Yang, J. C., Everett, M., Buehler, C., and McMillan, L. 2002. A real-time distributed light field camera. In Eurographics Workshop on Rendering, 77–85. Google ScholarDigital Library
    35. Yang, R., Welch, G., and Bishop, G. 2002. Real-time consensus-based scene reconstruction using commodity graphics hardware. In Proceedings of Pacific Graphics, 225–234. Google ScholarDigital Library
    36. Zhang, Y., and Kambhamettu, C. 2001. On 3D scene flow and structure estimation. In Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, 778–785.Google Scholar
    37. Zhang, L., Curless, B., and Seitz, S. M. 2003. Spacetime stereo: Shape recovery for dynamic scenes. In Conference on Computer Vision and Pattern Recognition, 367–374.Google Scholar
    38. Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 11, 1330–1334. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: