Tools for placing cuts and transitions in interview video

Floraine Berthouzoz; Wilmot Li; Maneesh Agrawala

“Tools for placing cuts and transitions in interview video” by Berthouzoz, Li and Agrawala

Next: “Toon-Chat: A Cartoon-Masked Chat System for... »

« Previous: “Toolglass and magic lenses: the see-through...

Conference:

SIGGRAPH 2012

Type(s):

Technical Papers

Title:

Tools for placing cuts and transitions in interview video

Presenter(s)/Author(s):

Floraine Berthouzoz

Wilmot Li

Maneesh Agrawala

Abstract:

We present a set of tools designed to help editors place cuts and create transitions in interview video. To help place cuts, our interface links a text transcript of the video to the corresponding locations in the raw footage. It also visualizes the suitability of cut locations by analyzing the audio/visual features of the raw footage to find frames where the speaker is relatively quiet and still. With these tools editors can directly highlight segments of text, check if the endpoints are suitable cut locations and if so, simply delete the text to make the edit. For each cut our system generates visible (e.g. jump-cut, fade, etc.) and seamless, hidden transitions. We present a hierarchical, graph-based algorithm for efficiently generating hidden transitions that considers visual features specific to interview footage. We also describe a new data-driven technique for setting the timing of the hidden transition. Finally, our tools offer a one click method for seamlessly removing ‘ums’ and repeated words as well as inserting natural-looking pauses to emphasize semantic content. We apply our tools to edit a variety of interviews and also show how they can be used to quickly compose multiple takes of an actor narrating a story.

References:

1. Abel, J., and Glass, I. 1999. Radio: An illustrated guide. WBEZ Alliance Inc.Google Scholar
2. Agarwala, A., Zheng, K., Pal, C., Agrawala, M., Cohen, M., Curless, B., Salesin, D., and Szeliski, R. 2005. Panoramic video textures. Proc. SIGGRAPH 24, 3, 821–827. Google ScholarDigital Library
3. Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of ACM 45, 6, 891–923. Google ScholarDigital Library
4. Athitsos, V., Alon, J., Sclaroff, S., and Kollios, G. 2004. BoostMap: A method for efficient approximate similarity rankings. Proc. CVPR, II:268–II:275. Google ScholarDigital Library
5. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH, 187–194. Google ScholarDigital Library
6. Boreczky, J., and Rowe, L. 1996. Comparison of video shot boundary detection techniques. JEI 5, 2, 122–128.Google ScholarCross Ref
7. Bregler, C., and Omohundro, S. 1995. Nonlinear manifold learning for visual speech recognition. Proc. ICCV, 494–499. Google ScholarDigital Library
8. Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In Proc. SIGGRAPH, 353–360. Google ScholarDigital Library
9. Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. 2004. High accuracy optical flow estimation based on a theory for warping. Proc. ECCV, 25–36.Google Scholar
10. Casares, J., Long, A., Myers, B., Bhatnagar, R., Stevens, S., Dabbish, L., Yocum, D., and Corbett, A. 2002. Simplifying video editing using metadata. In Proc. DIS, 157–166. Google ScholarDigital Library
11. Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proc. CVPR, 886–893. Google ScholarDigital Library
12. Dale, K., Sunkavalli, K., Johnson, M., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. Proc. SIGGRAPH ASIA 30, 6, 130:1–130:10. Google Scholar
13. Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. Proc. CHI, 237–246. Google ScholarDigital Library
14. Fowlkes, C., Belongie, S., Chung, F., and Malik, J. 2004. Spectral grouping using the nystrom method. PAMI 26, 2, 214–225. Google ScholarDigital Library
15. Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. 2000. A semi-automatic approach to home video editing. Proc. UIST, 81–89. Google ScholarDigital Library
16. Goldman, D., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. 2008. Video object annotation, navigation, and composition. Proc. UIST, 3–12. Google ScholarDigital Library
17. Gomes, J. 1999. Warping and morphing of graphical objects, vol. 1. Morgan Kaufmann. Google ScholarDigital Library
18. Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. DRAGON: A direct manipulation interface for frame-accurate in-scene video navigation. Proc. CHI, 247–250. Google ScholarDigital Library
19. Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. 2010. Being John Malkovich. Proc. ECCV, 341–353. Google ScholarDigital Library
20. Kemelmacher-Shlizerman, I., Shechtman, E., Garg, R., and Seitz, S. 2011. Exploring photobios. ACM Trans. on Graph. (Proc. SIGGRAPH) 30, 4, 61:1–61:10. Google ScholarDigital Library
21. Kwatra, V., Schodl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: Image and video synthesis using graph cuts. Proc. SIGGRAPH 22, 3, 277–286. Google ScholarDigital Library
22. Mahajan, D., Huang, F., Matusik, W., Ramamoorthi, R., and Belhumeur, P. 2009. Moving gradients: A path-based method for plausible image interpolation. Proc. SIGGRAPH 28, 3, 42:1–42:11. Google ScholarDigital Library
23. O’Steen, B. 2009. The Invisible Cut: How Editors Make Movie Magic. Michael Wiese Productions.Google Scholar
24. Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. Proc. SIGGRAPH, 75–84. Google ScholarDigital Library
25. Potamianos, G., Neti, C., Gravier, G., Garg, A., and Senior, A. 2003. Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91, 9, 1306–1326.Google ScholarCross Ref
26. Ranjan, A., Birnholtz, J., and Balakrishnan, R. 2008. Improving meeting capture by applying television production principles with audio and motion detection. In Proc. CHI, ACM, 227–236. Google ScholarDigital Library
27. Saragih, J., Lucey, S., and Cohn, J. 2009. Face alignment through subspace constrained mean-shifts. ICCV, 1034–1041.Google Scholar
28. Schödl, A., and Essa, I. 2002. Controlled animation of video sprites. In Proc. SCA, 121–127. Google ScholarDigital Library
29. Schödl, A., Szeliski, R., Salesin, D., and Essa, I. 2000. Video textures. Proc. SIGGRAPH, 489–498. Google ScholarDigital Library
30. Shechtman, E., Rav-Acha, A., Irani, M., and Seitz, S. 2010. Regenerative morphing. Proc. CVPR, 615–622.Google Scholar
31. Truong, B., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM TOMCCAP 3, 1. Google ScholarDigital Library
32. Ueda, H., Miyatake, T., and Yoshizawa, S. 1991. IMPACT: An interactive natural-motion-picture dedicated multimedia authoring system. Proc. CHI, 343–350. Google ScholarDigital Library
33. Virage. Audio analysis. http://www.virage.com/.Google Scholar
34. Wexler, Y., Shechtman, E., and Irani, M. 2007. Space-time completion of video. PAMI 29, 3, 463–476. Google ScholarDigital Library
35. Zhang, H., Low, C., Smoliar, S., and Wu, J. 1995. Video parsing, retrieval and browsing: an integrated and content-based solution. Proc. Multimedia, 15–24. Google ScholarDigital Library
36. Zhang, L., Snavely, N., Curless, B., and Seitz, S. 2004. Spacetime faces: High resolution capture for modeling and animation. Proc. SIGGRAPH, 548–558. Google ScholarDigital Library

ACM Digital Library Publication: