“Defocus video matting” by McGuire, Matusik, Pfister, Hughes and Durand

  • ©Morgan McGuire, Wojciech Matusik, Hanspeter Pfister, John F. Hughes, and Frédo Durand




    Defocus video matting



    Video matting is the process of pulling a high-quality alpha matte and foreground from a video sequence. Current techniques require either a known background (e.g., a blue screen) or extensive user interaction (e.g., to specify known foreground and background elements). The matting problem is generally under-constrained, since not enough information has been collected at capture time. We propose a novel, fully autonomous method for pulling a matte using multiple synchronized video streams that share a point of view but differ in their plane of focus. The solution is obtained by directly minimizing the error in filter-based image formation equations, which are over-constrained by our rich data stream. Our system solves the fully dynamic video matting problem without user assistance: both the foreground and background may be high frequency and have dynamic content, the foreground may resemble the background, and the scene is lit by natural (as opposed to polarized or collimated) illumination.


    1. Apostoloff, N. E., and Fitzgibbon, A. W. 2004. Bayesian video matting using learnt image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 407–414.Google Scholar
    2. Asada, N., Fujiwara, H., and Matsuyama, T. 1998. Seeing behind the scene: analysis of photometric properties of occluding edges by the reversed projection blurring model. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 2, 155–67. Google ScholarDigital Library
    3. Ben-Ezra, M., and Nayar, S. 2004. Jitter camera: High resolution video from a low resolution detector. In IEEE CVPR, 135–142. Google ScholarDigital Library
    4. Bhasin, S. S., and Chaudhuri, S. 2001. Depth from defocus in presence of partial self occlusion. Proceedings of the International Conference on Computer Vision 1, 2, 488–93.Google Scholar
    5. Blake, A., Rother, C., Brown, M., Perez. P., and Torr, P. 2004. Interactive image segmentation using an adaptive gmmrf model. Proceedings of the European Conference on Computer Vision (ECCV).Google Scholar
    6. Chuang, Y.-Y., Curless, B., Salesin, D. H., and Szeliski, R. 2001. A bayesian approach to digital matting. In Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 2, 264–271. Google ScholarDigital Library
    7. Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. 2002. Video matting of complex scenes. ACM Trans. on Graphics 21, 3 (July), 243–248. Google ScholarDigital Library
    8. Debevec, P. E., and Malik, J. 1997. Recovering high dynamic range radiance maps from photographs. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., 369–378. Google ScholarDigital Library
    9. Favaro, P., and Soatto, S. 2003. Seeing beyond occlusions (and other marvels of a finite lens aperture). In IEEE CVPR, 579–586.Google Scholar
    10. Fleischer, M., 1917. Method of producing moving picture cartoons. US Patent no. 1,242,674.Google Scholar
    11. Glassner, A. S. 1995. Principles of Digital Image Synthesis. Morgan Kaufmann Publishers, Inc. Google ScholarDigital Library
    12. Haralick, R. M., Sternberg, S. R., and Zhuang, X. 1987. Image analysis using mathematical morphology. IEEE PAMI 9, 4, 532–550. Google ScholarDigital Library
    13. Hecht, E. 1998. Optics Third Edition. Addison Wesley Longman, Inc.Google Scholar
    14. Hillman, P., Hannah, J., and Renshaw, D. 2001. Alpha channel estimation in high resolution images and image sequences. In Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 1, 1063–1068.Google Scholar
    15. Levoy, M., Chen, B., Vaish, V., Horowitz, M., McDowall, I., and Bolas, M. 2004. Synthetic aperture confocal imaging. ACM Trans. Graph. 23, 3, 825–834. Google ScholarDigital Library
    16. Malvar, H. S., Wei He, L., and Cutler, R. 2004. High-quality linear interpolation for demosaicing of bayer-patterned color images. Proceedings of the IEEE International Conference on Speech, Acoustics, and Signal Processing.Google Scholar
    17. Nayar, S. K., and Branzoi, V. 2003. Adaptive dynamic range imaging: Optical control of pixel exposures over space and time. In Proceedings of the International Conference on Computer Vision (ICCV), 1168–1175. Google ScholarDigital Library
    18. Nayar, S. K., Watanabe, M., and Noguchi, M. 1996. Real-time focus range sensor. IEEE PAMI 18, 12, 1186–1198. Google ScholarDigital Library
    19. Nocedal, J., and Wright, S. J. 1999. Numerical Optimization. Springer Verlag.Google Scholar
    20. Pentland, A. P. 1987. A new sense for depth of field. IEEE PAMI 9, 4, 523–531. Google ScholarDigital Library
    21. Porter, T., and Duff, T. 1984. Compositing digital images. In Proceedings of the 11th annual conference on Computer graphics and interactive techniques, ACM Press, 253–259. Google ScholarDigital Library
    22. Potmesil, M., and Chakravarty, I. 1983. Modeling motion blur in computer-generated images. Computer Graphics 17, 3 (July), 389–399. Google ScholarDigital Library
    23. Rother, C., Kolmogorov, V., and Blake, A. 2004. “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. on Graphics 23, 3, 309–314. Google ScholarDigital Library
    24. Ruzon, M. A., and Tomasi, C. 2000. Alpha estimation in natural images. In CVPR 2000, vol. 1, 18–25.Google Scholar
    25. Schechner, Y. Y., Kiryati, N., and Basri, R. 2000. Separation of transparent layers using focus. International Journal of Computer Vision, 25–39. Google ScholarDigital Library
    26. Smith, A. R., and Blinn, J. F. 1996. Blue screen matting. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, 259–268. Google ScholarDigital Library
    27. Sun, J., Jia, J., Tang, C.-K., and Shum, H.-Y. 2004. Poisson matting. ACM Transactions on Graphics (August), 315–321. Google ScholarDigital Library
    28. Yahav, G., and Iddan, G. 2002. 3dv systems’ zcam. Broadcast Engineering.Google Scholar
    29. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. on Graphics 23, 3, 600–608. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: