audeosynth: music-driven video montage

We introduce music-driven video montage, a media format that offers a pleasant way to browse or summarize video clips collected from various occasions, including gatherings and adventures. In music-driven video montage, the music drives the composition of the video content. According to musical movement and beats, video clips are organized to form a montage that visually reflects the experiential properties of the music. Nonetheless, it takes enormous manual work and artistic expertise to create it. In this paper, we develop a framework for automatically generating music-driven video montages. The input is a set of video clips and a piece of background music. By analyzing the music and video content, our system extracts carefully designed temporal features from the input, and casts the synthesis problem as an optimization and solves the parameters through Markov Chain Monte Carlo sampling. The output is a video montage whose visual activities are cut and synchronized with the rhythm of the music, rendering a symphony of audio-visual resonance.

References:

1. Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D., and Cohen, M. 2004. Interactive digital photomontage. ACM Trans. Graph. 23, 3, 294–302. Google ScholarDigital Library
2. Andrieu, C., 2003. An introduction to mcmc for machine learning.Google Scholar
3. Arev, I., Park, H. S., Sheikh, Y., Hodgins, J. K., and Shamir, A. 2014. Automatic editing of footage from multiple social cameras. ACM Trans. Graph. (SIGGRAPH) (August). Google ScholarDigital Library
4. Bai, J., Agarwala, A., Agrawala, M., and Ramamoorthi, R. 2012. Selectively de-animating video. ACM Trans. Graph. 31, 4. Google ScholarDigital Library
5. Beck, J., and Burg, K., 2012. Cinemagraphs. http://cinemagraphs.com/.Google Scholar
6. Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., S, M. B., and Member, S. 2004. A tutorial on onset detection in music signals. In IEEE Transactions in Speech and Audio Processing.Google Scholar
7. Berthouzoz, F., Li, W., and Agrawala, M. 2012. Tools for placing cuts and transitions in interview video. ACM Trans. Graph. 31, 4 (July), 67:1–67:8. Google ScholarDigital Library
8. Cardle, M., Barthe, L., Brooks, S., and Robinson, P. 2002. Music-driven motion editing: local motion transformations Guided By Music Analysis. In Eurographics Uk Conference, 2002. Proceedings. The 20th, 38–44. Google ScholarDigital Library
9. Chai, W. 2006. Semantic segmentation and summarization of music. In IEEE Signal Processing Magazine, 124–132.Google Scholar
10. Chen, J.-C., Chu, W.-T., Kuo, J.-H., Weng, C.-Y., and Wu, J.-L. 2006. Tiling slideshow. In Proceedings of the 14th Annual ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA ’06, 25–34. Google ScholarDigital Library
11. Cheng, M.-M., Mitra, N. J., Huang, X., Torr, P. H. S., and Hu, S.-M. 2014. Global contrast based salient region detection. IEEE TPAMI.Google Scholar
12. Chib, S., and Greenberg, E., 1995. Understanding the metropolis-hastings algorithm.Google Scholar
13. Chion, M. 1994. Audio-Vision: Sound on the Screen. New York: Columbia University Press.Google Scholar
14. Chuang, Y.-Y., Goldman, D. B., Zheng, K. C., Curless, B., Salesin, D. H., and Szeliski, R. 2005. Animating pictures with stochastic motion textures. ACM Trans. Graph.. Google ScholarDigital Library
15. Cohen, M., and Szeliski, R. 2006. The moment camera. IEEE Computer 39, 8 (Aug). Google ScholarDigital Library
16. Davis, A., Rubinstein, M., Wadhwa, N., Mysore, G., Durand, F., and Freeman, W. T. 2014. The visual microphone: Passive recovery of sound from video. ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4, 79:1–79:10. Google ScholarDigital Library
17. Dmytryk, E. 2010. On Film Editing: An Introduction to the Art of Film Construction. London: Focal Press.Google Scholar
18. Doel, K. V. D., Kry, P. G., and Pai, D. K. 2001. Foleyautomatic: Physically-based sound effects for interactive simulation and animation. In Computer Graphics (ACM SIGGRAPH 01 Conference Proceedings), ACM Press, 537–544. Google ScholarDigital Library
19. Goodwin, A. 1992. Dancing in the distraction factory: Music, television and popular culture. Minneapolis: University of Minnesota Press.Google Scholar
20. Huber, D. M. 1991. The MIDI Manual. Indiana: SAMS. Google ScholarDigital Library
21. Irani, M., Anandan, P., Bergen, J., Kumar, R., and Hsu, S. 1996. Efficient representations of video sequences and their applications. In Signal Processing: Image Communication, 327–351.Google Scholar
22. Joshi, N., Mehta, S., Drucker, S., Stollnitz, E., Hoppe, H., Uyttendaele, M., and Cohen, M. 2012. Cliplets: Juxtaposing still and dynamic imagery. Proceedings of UIST. Google ScholarDigital Library
23. Kopf, J., Chen, B., Szeliski, R., and Cohen, M. 2010. Street slide: Browsing street level imagery. ACM Trans. Graph. 29, 4 (July), 96:1–96:8. Google ScholarDigital Library
24. Langlois, T. R., and James, D. L. 2014. Inverse-foley animation: Synchronizing rigid-body motions to sound. ACM Trans. Graph. (SIGGRAPH) 33, 4. Google ScholarDigital Library
25. Lee, H.-C., and Lee, I.-K. 2005. Automatic synchronization of background music and motion. In Computer Graphics Forum, 24 (3)., 353–362.Google ScholarCross Ref
26. Liao, Z., Joshi, N., and Hoppe, H. 2013. Automated video looping with progressive dynamism. ACM Trans. Graph. 32, 4 (July), 77:1–77:10. Google ScholarDigital Library
27. Liu, C., Torralba, A., Freeman, W. T., Durand, F., and Adelson, E. H. 2005. Motion magnification. ACM Trans. Graph. 24, 3 (July), 519–526. Google ScholarDigital Library
28. Murch, W. 2001. In the blink of an eye. Los Angeles: Silman-James Press.Google Scholar
29. Pritch, Y., Rav-Acha, A., and Peleg, S. 2008. Nonchronological video synopsis and indexing. IEEE Trans. on Pattern Anal. Mach. Intell. 30, 11. Google ScholarDigital Library
30. Schödl, A., Szeliski, R., Salesin, D. H., and Essa, I. 2000. Video textures. In Proceedings of SIGGRAPH 2000, Computer Graphics Proceedings, Annual Conference Series, 489–498. Google ScholarDigital Library
31. Shiratori, T., Nakazawa, A., and Ikeuchi, K. 2005. Dancing-to-music character animation. 353–362.Google Scholar
32. Swift, A., 2014. An introduction to midi. http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol1/aps2/.Google Scholar
33. Wang, O., Schroers, C., Zimmer, H., Gross, M., and Sorkine-Hornung, A. 2014. Videosnapping: Interactive synchronization of multiple videos. ACM Trans. Graph. 33, 4 (July), 77:1–77:10. Google ScholarDigital Library
34. Wesseling, C. 2004. Experiencing music video: Aesthetics and cultural context. New York: Columbia University Press.Google Scholar
35. Yang, L., Tse, Y.-C., Sander, P. V., Lawrence, J., Nehab, D., Hoppe, H., and Wilkins, C. L. 2011. Image-based bidirectional scene reprojection. In Proceedings of the 2011 SIGGRAPH Asia Conference, SA ’11, 150:1–150:10. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2015: Technical Papers

“audeosynth: music-driven video montage”

Conference:

Type(s):

Title:

Session/Category Title: Video Processing

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: