“Video matching” by Sand and Teller

  • ©Peter Sand and Seth Teller




    Video matching



    This paper describes a method for bringing two videos (recorded at different times) into spatiotemporal alignment, then comparing and combining corresponding pixels for applications such as background subtraction, compositing, and increasing dynamic range. We align a pair of videos by searching for frames that best match according to a robust image registration process. This process uses locally weighted regression to interpolate and extrapolate high-likelihood image correspondences, allowing new correspondences to be discovered and refined. Image regions that cannot be matched are detected and ignored, providing robustness to changes in scene content and lighting, which allows a variety of new applications.


    1. AGARWALA, A., DONTCHEVA, M., AGRAWALA, M., DRUCKER, S., COLBURN, A., CURLESS, B., SALESIN, D., AND COHEN, M. 2004. Interactive digital photomontage. ACM Trans. Graph., In press.]] Google ScholarDigital Library
    2. ATKESON, C. G., MOORE, A. W., AND SCHAAL, S. 1997. Locally weighted learning. Artificial Intelligence Review 11, 1-5, 11–73.]] Google ScholarDigital Library
    3. BEAUCHEMIN, S. S., AND BARRON, J. L. 1995. The computation of optical flow. ACM Computing Surveys 27, 3, 433–467.]] Google ScholarDigital Library
    4. BIRCHFIELD, S., AND TOMASI, C. 1998. A pixel dissimilarity measure that is insensitive to image sampling. IEEE Trans. on Pattern Analysis and Machine Intelligence 20, 4, 401–406.]] Google ScholarDigital Library
    5. BLACK, M. J., AND ANANDAN, P. 1996. The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding 63, 1, 75–104.]] Google ScholarDigital Library
    6. BROWN, M., AND LOWE, D. G. 2003. Recognising panoramas. In ICCV, 1218–1225.]] Google ScholarDigital Library
    7. CASPI, Y., AND IRANI, M. 2000. A step towards sequence to sequence alignment. In CVPR, 682–689.]]Google Scholar
    8. CHUANG, Y.-Y., AGARWALA, A., CURLESS, B., SALESIN, D. H., AND SZELISKI, R. 2002. Video matting of complex scenes. ACM Trans. Graph. 21, 3, 243–248.]] Google ScholarDigital Library
    9. DAVISON, A. J., DEUTSCHER, J., AND REID, I. D. 2001. Markerless motion capture of complex full-body movement for character animation. In Eurographics Workshop on Animation and Simulation, 3–14.]] Google ScholarDigital Library
    10. DEBEVEC, P. E., AND MALIK, J. 1997. Recovering high dynamic range radiance maps from photographs. In SIGGRAPH, 369–378.]] Google ScholarDigital Library
    11. DEMPSTER, A. P., LAIRD, N. M., AND RUBIN, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39, 1, 1–38.]]Google ScholarCross Ref
    12. FERRARI, V., TUYTELAARS, T., AND VAN GOOL, L. 2001. Real-time affine region tracking and coplanar grouping. In CVPR, 226–233.]]Google Scholar
    13. FISCHLER, M. A., AND BOLLES, R. C. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6, 381–395.]] Google ScholarDigital Library
    14. HARRIS, C., AND STEPHENS, M. 1988. A combined corner and edge detector. In 4th Alvey Vision Conference, 147–151.]]Google ScholarCross Ref
    15. HARTLEY, R., AND ZISSERMAN, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge, UK.]] Google ScholarDigital Library
    16. KANAZAWA, Y., AND KANATANI, K. 2002. Robust image matching under a large disparity. In Workshop on Science of Computer Vision, 46–52.]]Google Scholar
    17. KANG, S. B., UYTTENDAELE, M., WINDER, S., AND SZELISKI, R. 2003. High dynamic range video. ACM Trans. Graph. 22, 3, 319–325.]] Google ScholarDigital Library
    18. KUTULAKOS, K. N. 2000. Approximate N-view stereo. In ECCV, 67–83.]] Google ScholarDigital Library
    19. LUCAS, B., AND KANADE, T. 1981. An iterative image registration technique with an application to stereo vision. In Int. Joint Conf. Artificial Intelligence, 674–679.]]Google Scholar
    20. NOBLE, A. 1989. Descriptions of Image Surfaces. PhD thesis, Oxford University, Oxford, UK.]]Google Scholar
    21. RAO, C., GRITAI, A., AND SHAH, M. 2003. View-invariant alignment and matching of video sequences. In ICCV, 939–945.]] Google ScholarDigital Library
    22. SAND, P., AND TELLER, S. 2004. Video matching. Tech. Rep. LCS TR 947, MIT.]]Google Scholar
    23. SAWHNEY, H. S., GUO, Y., HANNA, K., KUMAR, R., ADKINS, S., AND ZHOU, S. 2001. Hybrid stereo camera: an IBR approach for synthesis of very high resolution stereoscopic image sequences. In SIGGRAPH, 451–460.]] Google ScholarDigital Library
    24. SCHARSTEIN, D., AND SZELISKI, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision 47, 1-3, 7–42.]] Google ScholarDigital Library
    25. SCHÖDL, A., SZELISKI, R., SALESIN, D. H., AND ESSA, I. 2000. Video textures. In SIGGRAPH, 489–498.]] Google ScholarDigital Library
    26. SHI, J., AND TOMASI, C. 1994. Good features to track. In CVPR, 593–600.]]Google Scholar
    27. SMITH, P., SINCLAIR, D., CIPOLLA, R., AND WOOD, K. 1998. Effective corner matching. In British Machine Vision Conference, 545–556.]]Google ScholarCross Ref
    28. SZELISKI, R., AND SCHARSTEIN, D. 2002. Symmetric sub-pixel stereo matching. In ECCV, 525–540.]] Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: