“Interactive video cutout”

  • ©Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, and Michael F. Cohen




    Interactive video cutout



    We present an interactive system for efficiently extracting foreground objects from a video. We extend previous min-cut based image segmentation techniques to the domain of video with four new contributions. We provide a novel painting-based user interface that allows users to easily indicate the foreground object across space and time. We introduce a hierarchical mean-shift preprocess in order to minimize the number of nodes that min-cut must operate on. Within the min-cut we also define new local cost functions to augment the global costs defined in earlier work. Finally, we extend 2D alpha matting methods designed for images to work with 3D video volumes. We demonstrate that our matting approach preserves smoothness across both space and time. Our interactive video cutout system allows users to quickly extract foreground objects from video sequences for use in a variety of applications including compositing onto new backgrounds and NPR cartoon style rendering.


    1. Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D., and Cohen, M. 2004. Interactive digital photomontage. In Proceedings of ACM SIGGRAPH. 294–302. Google ScholarDigital Library
    2. Agarwala, A., Hertzmann, A., Salesin, D. H., and Seitz, S. M. 2004. Keyframe-based tracking for rotoscoping and animation. In Proceedings of ACM SIGGRAPH, 584–591. Google ScholarDigital Library
    3. Belongie, S., Malik. J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. on Pattern Analysis and Machine Intelligence 24, 4, 509–522. Google ScholarDigital Library
    4. Bennett, E. P., and McMillan, L. 2003. Proscenium: A framework for spatio-temporal video editing. In Proceedings of ACM Multimedia, 177–183. Google ScholarDigital Library
    5. Blake, A., and Isard, M. 1998. Active Contours. Springer-Verlag.Google Scholar
    6. Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Analysis and Machine Intelligence 23, 11, 1222–1239. Google ScholarDigital Library
    7. Chuang, Y.-Y., Curless, B., Salesin, D. H., and Szeliski, R. 2001. A bayesian approach to digital matting. In Proceedings of IEEE CVPR 2001, vol. 2, 264–271. Google ScholarDigital Library
    8. Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. 2002. Video matting of complex scenes. ACM Transactions on Graphics 21, 3, 243–248. Google ScholarDigital Library
    9. Collomosse, J. P., Rowntree, D., and Hall, P. M. 2003. Stroke surfaces: A spatio-temporal framework for temporally coherent non-photorealistic animations. University of Bath, Technical Report CSBU 2003–01 (June 2003).Google Scholar
    10. Comaniciu, D., Ramesh, V., and Meer, P. 2001. The variable bandwidth mean shift and data-driven scale selection. In Proc. IEEE 8th Int. Conf. on Computer Vision.Google Scholar
    11. Dementhon, D., and Megret, R. 2002. Spatio-temporal segmentation of video by hierarchical mean shift analysis. In University of Maryland Technical Report LAMP-TR-090, CAR-TR-978, CS-TR-4388, UMIACS-TR-2002-68.Google Scholar
    12. Fels, S. S., and Mase, K. 1999. Interactive video cubism. In Proceedings of the Workshop on New Paradigms for Interactive Visualization and Manipulation (NPIVM), 78–82. Google ScholarDigital Library
    13. Gleicher, M. 1995. Image snapping. In Proceedings of SIGGRAPH 95, 183–190. Google ScholarDigital Library
    14. Hall, J., Greenhill, D., and Jones, G. 1997. Segmenting film sequences using active surfaces. In International Conference on Image Processing (ICIP), 751–754. Google ScholarDigital Library
    15. Incorp., A. S. 2002. Adobe photoshop user guide.Google Scholar
    16. Kass, M., Witkin, A., and Terzopoulos, D. 1987. Snakes: Active contour models. International Journal of Computer Vision 1, 4, 321–331.Google ScholarCross Ref
    17. Klein, A. W., Sloan, P.-P. J., Finkelstein, A., and Cohen, M. F. 2002. Stylized video cubes. In Proceedings of SCA 2002. Google ScholarDigital Library
    18. Kwatra, V., Shoedl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: Image and video synthesis using graph cuts. In Proceedings of ACM SIGGRAPH, 277–286. Google ScholarDigital Library
    19. Li, Y., Sun, J., Tang, C.-K., and Shum, H.-Y. 2004. Lazysnapping. In Proceedings of ACM SIGGRAPH, 303–308. Google ScholarDigital Library
    20. Lucas, B. D., and Kanade, T. 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI ’81), 674–679.Google Scholar
    21. Luo, H., and Eleftheriadis, A. 1999. Spatial temporal active contour interpolation for semi-automatic video object generation. In International Conference on Image Processing (ICIP), 944–948.Google Scholar
    22. Mortensen, E., and Barrett, W. 1995. Intelligent scissors for image composition. In Proceedings of ACM SIGGRAPH, 191–198. Google ScholarDigital Library
    23. Prez, P., Blake, A., and Gangnet, M. 2001. Jetstream: Probabilistic contour extraction with particles. In Proc. Int. Conf. on Computer Vision, vol. II, 524–531.Google Scholar
    24. Reese, L. J., and Barrett, W. A. 2002. Image editing with intelligent paint. Proceedings of Eurographics 21, 3, 714–724.Google Scholar
    25. Rother, C., Kolmogorov, V., and Blake, A. 2004. Grabcut – interactive foreground extraction using iterated graph cut. In Proceedings of ACM SIGGRAPH, 309–314. Google ScholarDigital Library
    26. Ruzon, M., and Tomasi, C. 2000. Alpha estimation in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. I, 18–25.Google Scholar
    27. Wang, J., Xu, Y.-Q., Shum, H.-Y., and Cohen, M. F. 2004. Video tooning. In Proceedings of ACM SIGGRAPH, 574–583. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: