“audeosynth: music-driven video montage”

  • ©Zicheng Liao, Yizhou Yu, Bingchen Liu, and Lechao Cheng




    audeosynth: music-driven video montage

Session/Category Title: Video Processing




    We introduce music-driven video montage, a media format that offers a pleasant way to browse or summarize video clips collected from various occasions, including gatherings and adventures. In music-driven video montage, the music drives the composition of the video content. According to musical movement and beats, video clips are organized to form a montage that visually reflects the experiential properties of the music. Nonetheless, it takes enormous manual work and artistic expertise to create it. In this paper, we develop a framework for automatically generating music-driven video montages. The input is a set of video clips and a piece of background music. By analyzing the music and video content, our system extracts carefully designed temporal features from the input, and casts the synthesis problem as an optimization and solves the parameters through Markov Chain Monte Carlo sampling. The output is a video montage whose visual activities are cut and synchronized with the rhythm of the music, rendering a symphony of audio-visual resonance.


    1. Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D., and Cohen, M. 2004. Interactive digital photomontage. ACM Trans. Graph. 23, 3, 294–302. Google ScholarDigital Library
    2. Andrieu, C., 2003. An introduction to mcmc for machine learning.Google Scholar
    3. Arev, I., Park, H. S., Sheikh, Y., Hodgins, J. K., and Shamir, A. 2014. Automatic editing of footage from multiple social cameras. ACM Trans. Graph. (SIGGRAPH) (August). Google ScholarDigital Library
    4. Bai, J., Agarwala, A., Agrawala, M., and Ramamoorthi, R. 2012. Selectively de-animating video. ACM Trans. Graph. 31, 4. Google ScholarDigital Library
    5. Beck, J., and Burg, K., 2012. Cinemagraphs. http://cinemagraphs.com/.Google Scholar
    6. Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., S, M. B., and Member, S. 2004. A tutorial on onset detection in music signals. In IEEE Transactions in Speech and Audio Processing.Google Scholar
    7. Berthouzoz, F., Li, W., and Agrawala, M. 2012. Tools for placing cuts and transitions in interview video. ACM Trans. Graph. 31, 4 (July), 67:1–67:8. Google ScholarDigital Library
    8. Cardle, M., Barthe, L., Brooks, S., and Robinson, P. 2002. Music-driven motion editing: local motion transformations Guided By Music Analysis. In Eurographics Uk Conference, 2002. Proceedings. The 20th, 38–44. Google ScholarDigital Library
    9. Chai, W. 2006. Semantic segmentation and summarization of music. In IEEE Signal Processing Magazine, 124–132.Google Scholar
    10. Chen, J.-C., Chu, W.-T., Kuo, J.-H., Weng, C.-Y., and Wu, J.-L. 2006. Tiling slideshow. In Proceedings of the 14th Annual ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA ’06, 25–34. Google ScholarDigital Library
    11. Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., and Hu, S.-M. 2014. Global contrast based salient region detection. IEEE TPAMI.Google Scholar
    12. Chib, S., and Greenberg, E., 1995. Understanding the metropolis-hastings algorithm.Google Scholar
    13. Chion, M. 1994. Audio-Vision: Sound on the Screen. New York: Columbia University Press.Google Scholar
    14. Chuang, Y.-Y., Goldman, D. B., Zheng, K. C., Curless, B., Salesin, D. H., and Szeliski, R. 2005. Animating pictures with stochastic motion textures. ACM Trans. Graph.. Google ScholarDigital Library
    15. Cohen, M., and Szeliski, R. 2006. The moment camera. IEEE Computer 39, 8 (Aug). Google ScholarDigital Library
    16. Davis, A., Rubinstein, M., Wadhwa, N., Mysore, G., Durand, F., and Freeman, W. T. 2014. The visual microphone: Passive recovery of sound from video. ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4, 79:1–79:10. Google ScholarDigital Library
    17. Dmytryk, E. 2010. On Film Editing: An Introduction to the Art of Film Construction. London: Focal Press.Google Scholar
    18. Doel, K. V. D., Kry, P. G., and Pai, D. K. 2001. Foleyautomatic: Physically-based sound effects for interactive simulation and animation. In Computer Graphics (ACM SIGGRAPH 01 Conference Proceedings), ACM Press, 537–544. Google ScholarDigital Library
    19. Goodwin, A. 1992. Dancing in the distraction factory: Music, television and popular culture. Minneapolis: University of Minnesota Press.Google Scholar
    20. Huber, D. M. 1991. The MIDI Manual. Indiana: SAMS. Google ScholarDigital Library
    21. Irani, M., Anandan, P., Bergen, J., Kumar, R., and Hsu, S. 1996. Efficient representations of video sequences and their applications. In Signal Processing: Image Communication, 327–351.Google Scholar
    22. Joshi, N., Mehta, S., Drucker, S., Stollnitz, E., Hoppe, H., Uyttendaele, M., and Cohen, M. 2012. Cliplets: Juxtaposing still and dynamic imagery. Proceedings of UIST. Google ScholarDigital Library
    23. Kopf, J., Chen, B., Szeliski, R., and Cohen, M. 2010. Street slide: Browsing street level imagery. ACM Trans. Graph. 29, 4 (July), 96:1–96:8. Google ScholarDigital Library
    24. Langlois, T. R., and James, D. L. 2014. Inverse-foley animation: Synchronizing rigid-body motions to sound. ACM Trans. Graph. (SIGGRAPH) 33, 4. Google ScholarDigital Library
    25. Lee, H.-C., and Lee, I.-K. 2005. Automatic synchronization of background music and motion. In Computer Graphics Forum, 24 (3)., 353–362.Google ScholarCross Ref
    26. Liao, Z., Joshi, N., and Hoppe, H. 2013. Automated video looping with progressive dynamism. ACM Trans. Graph. 32, 4 (July), 77:1–77:10. Google ScholarDigital Library
    27. Liu, C., Torralba, A., Freeman, W. T., Durand, F., and Adelson, E. H. 2005. Motion magnification. ACM Trans. Graph. 24, 3 (July), 519–526. Google ScholarDigital Library
    28. Murch, W. 2001. In the blink of an eye. Los Angeles: Silman-James Press.Google Scholar
    29. Pritch, Y., Rav-Acha, A., and Peleg, S. 2008. Nonchronological video synopsis and indexing. IEEE Trans. on Pattern Anal. Mach. Intell. 30, 11. Google ScholarDigital Library
    30. Schödl, A., Szeliski, R., Salesin, D. H., and Essa, I. 2000. Video textures. In Proceedings of SIGGRAPH 2000, Computer Graphics Proceedings, Annual Conference Series, 489–498. Google ScholarDigital Library
    31. Shiratori, T., Nakazawa, A., and Ikeuchi, K. 2005. Dancing-to-music character animation. 353–362.Google Scholar
    32. Swift, A., 2014. An introduction to midi. http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/aps2/.Google Scholar
    33. Wang, O., Schroers, C., Zimmer, H., Gross, M., and Sorkine-Hornung, A. 2014. Videosnapping: Interactive synchronization of multiple videos. ACM Trans. Graph. 33, 4 (July), 77:1–77:10. Google ScholarDigital Library
    34. Wesseling, C. 2004. Experiencing music video: Aesthetics and cultural context. New York: Columbia University Press.Google Scholar
    35. Yang, L., Tse, Y.-C., Sander, P. V., Lawrence, J., Nehab, D., Hoppe, H., and Wilkins, C. L. 2011. Image-based bidirectional scene reprojection. In Proceedings of the 2011 SIGGRAPH Asia Conference, SA ’11, 150:1–150:10. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: