“Dynamic video narratives” by Correa and Ma

  • ©Carlos D. Correa and Kwan-Liu Ma




    Dynamic video narratives



    This paper presents a system for generating dynamic narratives from videos. These narratives are characterized for being compact, coherent and interactive, as inspired by principles of sequential art. Narratives depict the motion of one or several actors over time. Creating compact narratives is challenging as it is desired to combine the video frames in a way that reuses redundant backgrounds and depicts the stages of a motion. In addition, previous approaches focus on the generation of static summaries and can afford expensive image composition techniques. A dynamic narrative, on the other hand, must be played and skimmed in real-time, which imposes certain cost limitations in the video processing. In this paper, we define a novel process to compose foreground and background regions of video frames in a single interactive image using a series of spatio-temporal masks. These masks are created to improve the output of automatic video processing techniques such as image stitching and foreground segmentation. Unlike hand-drawn narratives, often limited to static representations, the proposed system allows users to explore the narrative dynamically and produce different representations of motion. We have built an authoring system that incorporates these methods and demonstrated successful results on a number of video clips. The authoring system can be used to create interactive posters of video clips, browse video in a compact manner or highlight a motion sequence in a movie.


    1. Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D., and Cohen, M. 2004. Interactive digital photomontage. ACM Trans. Graph. 23, 3, 294–302. Google ScholarDigital Library
    2. Agarwala, A., Zheng, K. C., Pal, C., Agrawala, M., Cohen, M., Curless, B., Salesin, D., and Szeliski, R. 2005. Panoramic video textures. ACM Trans. Graph. 24, 3, 821–827. Google ScholarDigital Library
    3. Agarwala, A., Agrawala, M., Cohen, M., Salesin, D., and Szeliski, R. 2006. Photographing long scenes with multi-viewpoint panoramas. ACM Trans. Graph. 25, 3, 853–861. Google ScholarDigital Library
    4. Anderson, D. M. 1961. Elements of Design. Holt, Rinehart and Winston.Google Scholar
    5. Aner, A., and Kender, J. R. 2002. Video summaries through mosaic-based shot and scene clustering. In ECCV ’02: Proceedings of the 7th European Conference on Computer Vision-Part IV, 388–402. Google ScholarDigital Library
    6. Apple Corporation, 2009. iMovie. http://www.apple.com/ilife/imovie.Google Scholar
    7. Assa, J., Caspi, Y., and Cohen-Or, D. 2005. Action synopsis: pose selection and illustration. ACM Trans. Graph. 24, 3, 667–676. Google ScholarDigital Library
    8. Barnes, C., Goldman, D. B., Shechtman, E., and Finkelstein, A. 2010. Video tapestries with continuous temporal zoom. ACM Transactions on Graphics 29, 3. Google ScholarDigital Library
    9. Bennett, E. P., and McMillan, L. 2007. Computational time-lapse video. In SIGGRAPH ’07: ACM SIGGRAPH 2007 papers, 102. Google ScholarDigital Library
    10. Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. 2000. An interactive comic book presentation for exploring video. In CHI ’00: Proc. SIGCHI conference on Human factors in computing systems, 185–192. Google ScholarDigital Library
    11. Boykov, Y., and Kolmogorov, V. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. on Pattern Analysis and Machine Intelligence 26, 9, 1124–1137. Google ScholarDigital Library
    12. Brown, M., and Lowe, D. G. 2003. Recognising panoramas. In ICCV ’03: Proc. Ninth IEEE International Conference on Computer Vision, 1218. Google ScholarDigital Library
    13. Caspi, Y., Axelrod, A., Matsushita, Y., and Gamliel, A. 2006. Dynamic stills and clip trailers. Vis. Comput. 22, 9, 642–652. Google ScholarDigital Library
    14. Chiu, P., Girgensohn, A., and Liu, Q. 2004. Stained-glass visualization for highly condensed video summaries. IEEE Conf. on Multimedia and Expo, 2004. ICME ’04. 2004 3, 2059–2062.Google Scholar
    15. Cutting, J. 2002. Representing motion in a static image: constraints and parallels in art, science, and popular culture. Perception 31, 1165–1193.Google ScholarCross Ref
    16. Eisner, W. 1985. Comics and Sequential Art. Poorhouse Press.Google Scholar
    17. Forlines, C. 2008. Content aware video presentation on high-resolution displays. In AVI ’08: Proceedings of the working conference on Advanced visual interfaces, 57–64. Google ScholarDigital Library
    18. Goldman, D. B., Curless, B., Salesin, D., and Seitz, S. M. 2006. Schematic storyboarding for video visualization and editing. ACM Trans. Graph. 25, 3, 862–871. Google ScholarDigital Library
    19. Granados, M., Seidel, H.-P., and Lensch, H. P. A. 2008. Background estimation from non-time sequence images. In GI ’08: Proc. Graphics Interface 2008, 33–40. Google ScholarDigital Library
    20. Irani, M., and Anandan, P. 1998. Video indexing based on mosaic representations. Proc. of the IEEE 86, 5 (May), 905–921.Google ScholarCross Ref
    21. Kaewtrakulpong, P., and Bowden, R. 2001. An improved adaptive background mixture model for realtime tracking with shadow detection. In In Proc. 2nd European Workshop on Advanced Video Based Surveillance Systems, AVBS01, Kluwer Academic Publishers.Google Scholar
    22. Kim, B., and Essa, I. 2005. Video-based nonphotorealistic and expressive illustration of motion. In CGI ’05: Proc. Computer Graphics International 2005, 32–35. Google ScholarDigital Library
    23. Kwatra, V., Schödl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: image and video synthesis using graph cuts. ACM Trans. Graph. 22, 3, 277–286. Google ScholarDigital Library
    24. Li, Y., Li, Y., Zhang, T., Zhang, T., Tretter, D., and Tretter, D. 2001. An overview of video abstraction techniques. Tech. rep., HP-2001-191, HP Laboratory.Google Scholar
    25. Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proc. tenth ACM international conference on Multimedia, 533–542. Google ScholarDigital Library
    26. McCloud, S. 1994. Understanding Comics. Perennial Currents.Google Scholar
    27. Mei, T., Yang, B., Yang, S.-Q., and Hua, X.-S. 2009. Video collage: presenting a video sequence using a single image. The Visual Computer 25, 1, 39–51. Google ScholarDigital Library
    28. Pal, C., and Jojic, N. 2005. Interactive montages of sprites for indexing and summarizing security video. In CVPR ’05: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 1192. Google ScholarDigital Library
    29. Pritch, Y., Rav-Acha, A., and Peleg, S. 2008. Non-chronological video synopsis and indexing. IEEE Trans. Pattern Analysis and Machine Intelligence 30, 11, 1971–1984. Google ScholarDigital Library
    30. Rav-Acha, A., Pritch, Y., Lischinski, D., and Peleg, S. 2007. Dynamosaicing: Mosaicing of dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 10, 1789–1801. Google ScholarDigital Library
    31. Rother, C., Kumar, S., Kolmogorov, V., and Blake, A. 2005. Digital tapestry. In CVPR ’05: Proc. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) – Volume 1, 589–596. Google ScholarDigital Library
    32. Rother, C., Bordeaux, L., Hamadi, Y., and Blake, A. 2006. Autocollage. ACM Trans. Graph. 25, 3, 847–852. Google ScholarDigital Library
    33. Sawhney, H. S., and Ayer, S. 1996. Compact representations of videos through dominant and multiple motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 18, 8, 814–830. Google ScholarDigital Library
    34. Schmandt-Besserat, D. 2007. When Writing Met Art: From Symbol to Story. University of Texas Press.Google Scholar
    35. Shum, H.-Y., and Szeliski, R. 1998. Construction and refinement of panoramic mosaics with global and local alignment. In ICCV ’98: Proc. Sixth International Conference on Computer Vision, 953. Google ScholarDigital Library
    36. Simakov, D., Caspi, Y., Shechtman, E., and Irani, M. 2008. Summarizing visual data using bidirectional similarity. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1–8.Google Scholar
    37. Taniguchi, Y., Akutsu, A., and Tonomura, Y. 1997. Panoramaexcerpts: extracting and packing panoramas for video browsing. In Proc. fifth ACM international conference on Multimedia, 427–436. Google ScholarDigital Library
    38. Teodosio, L., and Bender, W. 1993. Salient video stills: content and context preserved. In Proc. first ACM international conference on Multimedia, 39–46. Google ScholarDigital Library
    39. Tufte, E. R. 1990. Envisioning Information. Graphics Press, Cheshire, Connecticut. Google ScholarDigital Library
    40. Ueda, H., Miyatake, T., Sumino, S., and Nagasaka, A. 1993. Automatic structure visualization for video editing. In CHI ’93: Proc. INTERACT ’93 and CHI ’93 conference on Human factors in computing systems, 137–141. Google ScholarDigital Library
    41. Wood, D. N., Finkelstein, A., Hughes, J. F., Thayer, C. E., and Salesin, D. H. 1997. Multiperspective panoramas for cel animation. In SIGGRAPH ’97: Proc. 24th annual conference on Computer graphics and interactive techniques, 243–250. Google ScholarDigital Library
    42. Yang, B., Mei, T., Sun, L., Yang, S.-Q., and Hua, X.-S. 2008. Free-shaped video collage. In Lecture Notes in Computer Science, vol. 4903, 175–185. Google ScholarDigital Library
    43. Yeung, M., and Yeo, B.-L. 1997. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans. on Circuits and Systems for Video Technology 7, 5 (Oct), 771–785. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: