“Tools for placing cuts and transitions in interview video” by Berthouzoz, Li and Agrawala

  • ©Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala




    Tools for placing cuts and transitions in interview video



    We present a set of tools designed to help editors place cuts and create transitions in interview video. To help place cuts, our interface links a text transcript of the video to the corresponding locations in the raw footage. It also visualizes the suitability of cut locations by analyzing the audio/visual features of the raw footage to find frames where the speaker is relatively quiet and still. With these tools editors can directly highlight segments of text, check if the endpoints are suitable cut locations and if so, simply delete the text to make the edit. For each cut our system generates visible (e.g. jump-cut, fade, etc.) and seamless, hidden transitions. We present a hierarchical, graph-based algorithm for efficiently generating hidden transitions that considers visual features specific to interview footage. We also describe a new data-driven technique for setting the timing of the hidden transition. Finally, our tools offer a one click method for seamlessly removing ‘ums’ and repeated words as well as inserting natural-looking pauses to emphasize semantic content. We apply our tools to edit a variety of interviews and also show how they can be used to quickly compose multiple takes of an actor narrating a story.


    1. Abel, J., and Glass, I. 1999. Radio: An illustrated guide. WBEZ Alliance Inc.Google Scholar
    2. Agarwala, A., Zheng, K., Pal, C., Agrawala, M., Cohen, M., Curless, B., Salesin, D., and Szeliski, R. 2005. Panoramic video textures. Proc. SIGGRAPH 24, 3, 821–827. Google ScholarDigital Library
    3. Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of ACM 45, 6, 891–923. Google ScholarDigital Library
    4. Athitsos, V., Alon, J., Sclaroff, S., and Kollios, G. 2004. BoostMap: A method for efficient approximate similarity rankings. Proc. CVPR, II:268–II:275. Google ScholarDigital Library
    5. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH, 187–194. Google ScholarDigital Library
    6. Boreczky, J., and Rowe, L. 1996. Comparison of video shot boundary detection techniques. JEI 5, 2, 122–128.Google ScholarCross Ref
    7. Bregler, C., and Omohundro, S. 1995. Nonlinear manifold learning for visual speech recognition. Proc. ICCV, 494–499. Google ScholarDigital Library
    8. Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In Proc. SIGGRAPH, 353–360. Google ScholarDigital Library
    9. Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. 2004. High accuracy optical flow estimation based on a theory for warping. Proc. ECCV, 25–36.Google Scholar
    10. Casares, J., Long, A., Myers, B., Bhatnagar, R., Stevens, S., Dabbish, L., Yocum, D., and Corbett, A. 2002. Simplifying video editing using metadata. In Proc. DIS, 157–166. Google ScholarDigital Library
    11. Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proc. CVPR, 886–893. Google ScholarDigital Library
    12. Dale, K., Sunkavalli, K., Johnson, M., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. Proc. SIGGRAPH ASIA 30, 6, 130:1–130:10. Google Scholar
    13. Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. Proc. CHI, 237–246. Google ScholarDigital Library
    14. Fowlkes, C., Belongie, S., Chung, F., and Malik, J. 2004. Spectral grouping using the nystrom method. PAMI 26, 2, 214–225. Google ScholarDigital Library
    15. Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. 2000. A semi-automatic approach to home video editing. Proc. UIST, 81–89. Google ScholarDigital Library
    16. Goldman, D., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. 2008. Video object annotation, navigation, and composition. Proc. UIST, 3–12. Google ScholarDigital Library
    17. Gomes, J. 1999. Warping and morphing of graphical objects, vol. 1. Morgan Kaufmann. Google ScholarDigital Library
    18. Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. DRAGON: A direct manipulation interface for frame-accurate in-scene video navigation. Proc. CHI, 247–250. Google ScholarDigital Library
    19. Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. 2010. Being John Malkovich. Proc. ECCV, 341–353. Google ScholarDigital Library
    20. Kemelmacher-Shlizerman, I., Shechtman, E., Garg, R., and Seitz, S. 2011. Exploring photobios. ACM Trans. on Graph. (Proc. SIGGRAPH) 30, 4, 61:1–61:10. Google ScholarDigital Library
    21. Kwatra, V., Schodl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: Image and video synthesis using graph cuts. Proc. SIGGRAPH 22, 3, 277–286. Google ScholarDigital Library
    22. Mahajan, D., Huang, F., Matusik, W., Ramamoorthi, R., and Belhumeur, P. 2009. Moving gradients: A path-based method for plausible image interpolation. Proc. SIGGRAPH 28, 3, 42:1–42:11. Google ScholarDigital Library
    23. O’Steen, B. 2009. The Invisible Cut: How Editors Make Movie Magic. Michael Wiese Productions.Google Scholar
    24. Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. Proc. SIGGRAPH, 75–84. Google ScholarDigital Library
    25. Potamianos, G., Neti, C., Gravier, G., Garg, A., and Senior, A. 2003. Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91, 9, 1306–1326.Google ScholarCross Ref
    26. Ranjan, A., Birnholtz, J., and Balakrishnan, R. 2008. Improving meeting capture by applying television production principles with audio and motion detection. In Proc. CHI, ACM, 227–236. Google ScholarDigital Library
    27. Saragih, J., Lucey, S., and Cohn, J. 2009. Face alignment through subspace constrained mean-shifts. ICCV, 1034–1041.Google Scholar
    28. Schödl, A., and Essa, I. 2002. Controlled animation of video sprites. In Proc. SCA, 121–127. Google ScholarDigital Library
    29. Schödl, A., Szeliski, R., Salesin, D., and Essa, I. 2000. Video textures. Proc. SIGGRAPH, 489–498. Google ScholarDigital Library
    30. Shechtman, E., Rav-Acha, A., Irani, M., and Seitz, S. 2010. Regenerative morphing. Proc. CVPR, 615–622.Google Scholar
    31. Truong, B., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM TOMCCAP 3, 1. Google ScholarDigital Library
    32. Ueda, H., Miyatake, T., and Yoshizawa, S. 1991. IMPACT: An interactive natural-motion-picture dedicated multimedia authoring system. Proc. CHI, 343–350. Google ScholarDigital Library
    33. Virage. Audio analysis. http://www.virage.com/.Google Scholar
    34. Wexler, Y., Shechtman, E., and Irani, M. 2007. Space-time completion of video. PAMI 29, 3, 463–476. Google ScholarDigital Library
    35. Zhang, H., Low, C., Smoliar, S., and Wu, J. 1995. Video parsing, retrieval and browsing: an integrated and content-based solution. Proc. Multimedia, 15–24. Google ScholarDigital Library
    36. Zhang, L., Snavely, N., Curless, B., and Seitz, S. 2004. Spacetime faces: High resolution capture for modeling and animation. Proc. SIGGRAPH, 548–558. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: