“Visual rhythm and beat” by Davis and Agrawala

  • ©Abe Davis and Maneesh Agrawala



Entry Number: 122


    Visual rhythm and beat

Session/Category Title: Perception & Haptics




    We present a visual analogue for musical rhythm derived from an analysis of motion in video, and show that alignment of visual rhythm with its musical counterpart results in the appearance of dance. Central to our work is the concept of visual beats — patterns of motion that can be shifted in time to control visual rhythm. By warping visual beats into alignment with musical beats, we can create or manipulate the appearance of dance in video. Using this approach we demonstrate a variety of retargeting applications that control musical synchronization of audio and video: we can change what song performers are dancing to, warp irregular motion into alignment with music so that it appears to be dancing, or search collections of video for moments of accidentally dance-like motion that can be used to synthesize musical performances.


    1. Jiamin Bai, Aseem Agarwala, Maneesh Agrawala, and Ravi Ramamoorthi. 2012. Selectively De-Animating Video. ACM Transactions on Graphics (2012). http://graphics.berkeley.edu/papers/Bai-SDV-2012-08/ Google ScholarDigital Library
    2. Jean Charles Bazin and Alexander Sorkine-Hornung. 2016. ActionSnapping: Motion-Based Video Synchronization. In ECCV.Google Scholar
    3. Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for Placing Cuts and Transitions in Interview Video. ACM Trans. Graph. 31, 4, Article 67 (July 2012), 8 pages. Google ScholarDigital Library
    4. Sebastian Böck and Gerhard Widmer. 2013. Maximum Filter Vibrato Suppression for Onset Detection.Google Scholar
    5. Thaddeus L. Bolton. 1894. Rhythm. The American Journal of Psychology 6, 2 (1894), 145–238. http://www.jstor.org/stable/1410948Google ScholarCross Ref
    6. Timothy R. Brick and Steven M. Boker. 2011. Correlational Methods for Analysis of Dance Movements. Dance Research 29, supplement (2011), 283–304.Google ScholarCross Ref
    7. Kevin Burg and Jamie Beck. 2012. Cinemagraphs. (2012). http://cinemagraphs.com/Google Scholar
    8. M. Chion, C. Gorbman, and W. Murch. 1994. Audio-vision: Sound on Screen. Columbia University Press. https://books.google.com/books?id=BBs4Arfm98oCGoogle Scholar
    9. Yung-Yu Chuang, Dan B Goldman, Ke Colin Zheng, Brian Curless, David H. Salesin, and Richard Szeliski. 2005. Animating Pictures with Stochastic Motion Textures. ACM Trans. Graph. 24, 3 (July 2005), 853–860. Google ScholarDigital Library
    10. Hyun chul Lee and In kwon Lee. 2005. Automatic Synchronization of Background Music and Motion. In in Computer Animation,âĂİ in Computer Graphics Forum, Volume 24, Issue 3 (2005. 353–362.Google Scholar
    11. Laura K. Cirelli, Christina Spinelli, Sylvie Nozaradan, and Laurel J. Trainor. 2016. Measuring Neural Entrainment to Beat and Meter in Infants: Effects of Music Background. Frontiers in Neuroscience 10 (2016), 229.Google ScholarCross Ref
    12. H. Cowell and D. Nicholls. 1996. New Musical Resources. Cambridge University Press. https://books.google.com/books?id=BeLDXA-7TdACGoogle Scholar
    13. Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Oral Buyukozturk, Fredo Durand, and William T. Freeman. 2017. Visual Vibrometry: Estimating Material Properties from Small Motions in Video. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (April 2017), 732–745. Google ScholarDigital Library
    14. Abe Davis, Katherine L. Bouman, Justin G. Chen, Michael Rubinstein, Fredo Durand, and William T. Freeman. 2015a. Visual Vibrometry: Estimating Material Properties From Small Motion in Video. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    15. Abe Davis, Justin G. Chen, and Frédo Durand. 2015b. Image-space Modal Bases for Plausible Manipulation of Objects in Video. ACM Trans. Graph. 34, 6, Article 239 (Oct. 2015), 7 pages. Google ScholarDigital Library
    16. Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J. Mysore, Frédo Durand, and William T. Freeman. 2014. The Visual Microphone: Passive Recovery of Sound from Video. ACM Trans. Graph. 33, 4, Article 79 (July 2014), 10 pages. Google ScholarDigital Library
    17. Simon Dixon. 2006. Onset detection revisited. In In Proceedings of the 9th international conference on digital audio effects. 133–137.Google Scholar
    18. V. Dyaberi, H. Sundaram, T. Rikakis, and J. James. 2006. The Computational Extraction of Spatio-Temporal Formal Structures in the Interactive Dance Work `22′. In 2006 Fortieth Asilomar Conference on Signals, Systems and Computers. 59–63.Google Scholar
    19. Daniel P. W. Ellis. 2007. Beat Tracking by Dynamic Programming. Journal of New Music Research 36, 1 (2007), 51–60.Google ScholarCross Ref
    20. Masataka Goto. 2002. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds. 30 (09 2002).Google Scholar
    21. P. Grosche, M. Muller, and F. Kurth. 2010. Cyclic tempogram – A mid-level tempo representation for music signals. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. 5522–5525.Google Scholar
    22. Xiao Hu, Jin Ha Lee, David Bainbridge, Kahyun Choi, Peter Organisciak, and J. Stephen Downie. 2017. The MIREX Grand Challenge: A Framework of Holistic User-experience Evaluation in Music Information Retrieval. J. Assoc. Inf. Sci. Technol. 68, 1 (Jan. 2017), 97–112. Google ScholarDigital Library
    23. Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to Predict Where Humans Look. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
    24. Tae-hoon Kim, Sang Il Park, and Sung Yong Shin. 2003. Rhythmic-motion Synthesis Based on Motion-beat Analysis. ACM Trans. Graph. 22, 3 (July 2003), 392–401. Google ScholarDigital Library
    25. Timothy R. Langlois and Doug L. James. 2014. Inverse-Foley Animation: Synchronizing rigid-body motions to sound. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2014) 33, 4 (Aug. 2014). Google ScholarDigital Library
    26. Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational Video Editing for Dialogue-driven Scenes. ACM Trans. Graph. 36, 4, Article 130 (July 2017), 14 pages. Google ScholarDigital Library
    27. Alexander Lerch. 2012. An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics (1st ed.). Wiley-IEEE Press. Google ScholarDigital Library
    28. Zicheng Liao, Yizhou Yu, Bingchen Gong, and Lechao Cheng. 2015. audeosynth: Music-Driven Video Montage. ACM Trans. Graph. (SIGGRAPH) 34, 4 (2015). Google ScholarDigital Library
    29. Feng Liu, Yuzhen Niu, and Michael Gleicher. 2009. Using Web Photos for Measuring Video Frame Interestingness. In Proceedings of the 21st International Jont Conference on Artifical Intelligence (IJCAI’09). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2058–2063. http://dl.acm.org/citation.cfm?id=1661445.1661774 Google ScholarDigital Library
    30. LumBeat. 2013. 60 BPM Metronome. (Feb 2013). https://www.youtube.com/watch?v=gsJEMH_emBMGoogle Scholar
    31. Brian McFee, Matt McVicar, Oriol Nieto, Stefan Balke, Carl Thome, Dawen Liang, Eric Battenberg, Josh Moore, Rachel Bittner, Ryuichi Yamamoto, Dan Ellis, Fabian-Robert Stoter, Douglas Repetto, Simon Waloschek, CJ Carr, Seth Kranzler, Keunwoo Choi, Petr Viktorin, Joao Felipe Santos, Adrian Holovaty, Waldir Pimenta, and Hojin Lee. 2017. librosa 0.5.0. (Feb. 2017).Google Scholar
    32. Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python.Google Scholar
    33. K. McPherson. 2006. Making Video Dance: A Step-by-step Guide to Creating Dance for the Screen. Routledge. https://books.google.com/books?id=b3hVewAACAAJGoogle Scholar
    34. Trista P. Chen, Ching-Wei Chen, Phillip Popp, and Bob Coover. 2011. Visual Rhythm Detection and Its Applications in Interactive Multimedia. 18 (01 2011), 88–95. Google ScholarDigital Library
    35. Aniruddh D. Patel and Steven M. Demorest. 2013. 16 – Comparative Music Cognition: Cross-Species and Cross-Cultural Studies. In The Psychology of Music (Third Edition) (third edition ed.), Diana Deutsch (Ed.). Academic Press, 647 — 681.Google Scholar
    36. Aniruddh D. Patel, John R. Iversen, Micah R. Bregman, and Irena Schulz, {n. d.}. Experimental Evidence for Synchronization to a Musical Beat in a Nonhuman Animal. Current Biology 19, 10 (2017/11/14 {n. d.}), 827–830.Google Scholar
    37. L. C. Pickup, Z. Pan, D. Wei, Y. Shih, C. Zhang, A. Zisserman, B. Schölkopf, and W. T. Freeman. 2014. Seeing the Arrow of Time. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
    38. Y. Pritch, A. Rav-Acha, and S. Peleg. 2008. Nonchronological Video Synopsis and Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (Nov 2008), 1971–1984. Google ScholarDigital Library
    39. Bruno H. Repp and Yi-Huang Su. 2013. Sensorimotor synchronization: A review of recent research (2006–2012). Psychonomic Bulletin & Review 20, 3 (01 Jun 2013), 403–452.Google ScholarCross Ref
    40. Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video Textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 489–498. Google ScholarDigital Library
    41. C. Turk. 2002. Effective Speaking: Communicating in Speech. Taylor & Francis, https://books.google.com/books?id=afiTAgAAQBAJGoogle ScholarCross Ref
    42. Ubisoft. 2013. Just Dance Kids 2 I Am A Gummy Bear. (May 2013). https://www.youtube.com/watch?v=ITbZosS4dX3gGoogle Scholar
    43. C. Vernallis. 2004. Experiencing Music Video: Aesthetics and Cultural Context. Columbia University Press. https://books.google.com/books?id=DjDIw2pxjiMCGoogle Scholar
    44. Jue Wang, Steven M. Drucker, Maneesh Agrawala, and Michael F. Cohen. 2006. The Cartoon Animation Filter. ACM Trans. Graph. 25, 3 (July 2006), 1169–1173. Google ScholarDigital Library
    45. Oliver Wang, Christopher Schroers, Henning Zimmer, Markus Gross, and Alexander Sorkine-Hornung. 2014. VideoSnapping: Interactive Synchronization of Multiple Videos. ACM Trans. Graph. 33, 4, Article 77 (July 2014), 10 pages. Google ScholarDigital Library
    46. Shen-Zheng Wang, Yung-Sheng Chen, Shih-Hung Lee, and C.-C. Jay Kuo. 2008. Visual Tempo Analysis for MTV-Style Home Video Authoring. In Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 2 – Volume 02 (CISP ’08). IEEE Computer Society, Washington, DC, USA, 192–196. Google ScholarDigital Library
    47. David White, Kevin Loken, and Michiel van de Panne. 2006. Slow in and Slow out Cartoon Animation Filter. In ACM SIGGRAPH 2006 Research Posters (SIGGRAPH ’06). ACM, New York, NY, USA, Article 3. Google ScholarDigital Library
    48. Andrew Witkin and Zoran Popovic. 1995. Motion Warping. In Proceedings of the 22Nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’95). ACM, New York, NY, USA, 105–108. Google ScholarDigital Library
    49. WSJDigitalNetwork. 2012. Best Moments of First Obama/Romney Debate. (Oct 2012). https://www.youtube.com/watch?v=QQC0nz0t9F4Google Scholar
    50. YouTube:shubhgupta91. 2015. Turtle dancing at Satisfaction HD. (May 2015). https://www.youtube.com/watch?v=YE6_WbI0YLkGoogle Scholar
    51. Jean yves Bouguet. 2000. Pyramidal implementation of the Lucas Kanade feature tracker. Intel Corporation, Microprocessor Research Labs (2000).Google Scholar
    52. Zumba with Layryn. 2014. “Danza Kuduro” Zumba Routine. (Jun 2014). https://www.youtube.com/watch?v=gH20VFWEMdMGoogle Scholar

ACM Digital Library Publication:

Overview Page: