Scene-aware audio for 360° videos

Although 360° cameras ease the capture of panoramic footage, it remains challenging to add realistic 360° audio that blends into the captured scene and is synchronized with the camera motion. We present a method for adding scene-aware spatial audio to 360° videos in typical indoor scenes, using only a conventional mono-channel microphone and a speaker. We observe that the late reverberation of a room’s impulse response is usually diffuse spatially and directionally. Exploiting this fact, we propose a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail. The early reverberation is simulated using a geometric acoustic simulation and then enhanced using a frequency modulation method to capture room resonances. The late reverberation is extracted from a recorded impulse response, with a carefully chosen time duration that separates out the late reverberation from the early reverberation. In our validations, we show that our synthesized spatial audio matches closely with recordings using ambisonic microphones. Lastly, we demonstrate the strength of our method in several applications.

References:

1. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M Seitz. 2016. Jump: virtual reality video. ACM Trans. on Graph. 35, 6 (2016), 198. Google ScholarDigital Library
2. Stefan Bilbao. 2009. Numerical Sound Synthesis. John Wiley & Sons, Ltd.Google Scholar
3. Chunxiao Cao, Zhong Ren, Carl Schissler, Dinesh Manocha, and Kun Zhou. 2016. Interactive Sound Propagation with Bidirectional Path Tracing. ACM Trans. Graph. 35, 6 (2016), 180:1–180:11. Google ScholarDigital Library
4. Trevor J Cox, Peter D’Antonio, and Mark R Avis. 2004. Room sizing and optimization at low frequencies. Journal of the Audio Engineering Society 52, 6 (2004), 640–651.Google Scholar
5. F Dunn, WM Hartmann, DM Campbell, and Neville H Fletcher. 2015. Springer handbook of acoustics. Springer.Google Scholar
6. Angelo Farina. 2000. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In Audio Engineering Society Convention 108. Audio Engineering Society.Google Scholar
7. Angelo Farina. 2007. Advancements in Impulse Response Measurements by Sine Sweeps. In Audio Engineering Society Convention 122. Audio Engineering Society.Google Scholar
8. Thomas Funkhouser, Ingrid Carlbom, Gary Elko, Gopal Pingali, Mohan Sondhi, and Jim West. 1998. A Beam Tracing Approach to Acoustic Modeling for Interactive Virtual Environments. In Proc. SIGGRAPH ’98. 21–32. Google ScholarDigital Library
9. Mark B. Gardner. 1968. Historical Background of the Haas and/or Precedence Effect. The Journal of the Acoustical Society of America 43, 6 (1968), 1243–1248.Google ScholarCross Ref
10. François. G. Germain, Gautham. J. Mysore, and Takako. Fujioka. 2016. Equalization matching of speech recordings in real-world environments. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). 609–613.Google ScholarDigital Library
11. Murray Hodgson. 1991. Evidence of diffuse surface reflections in rooms. The Journal of the Acoustical Society of America 89, 2 (1991), 765–771.Google ScholarCross Ref
12. J. Huang, Z. Chen, D. Ceylan, and H. Jin. 2017. 6-DOF VR videos with a single 360-camera. In 2017 IEEE Virtual Reality (VR). 37–44.Google Scholar
13. Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, and Adam Finkelstein. 2017. VoCo: Text-based Insertion and Replacement in Audio Narration. ACM Trans. on Graph. 36, 4 (2017). Google ScholarDigital Library
14. Gary S. Kendall. 1995. The Decorrelation of Audio Signals and Its Impact on Spatial Imagery. Computer Music Journal 19, 4 (1995), 71–87.Google ScholarCross Ref
15. Johannes Kopf. 2016. 360 video stabilization. ACM Trans. on Graph. 35, 6 (2016), 195. Google ScholarDigital Library
16. Heinrich Kuttruff. 2017. Room Acoustics (sixth ed.). CRC Press.Google Scholar
17. Jungjin Lee, Bumki Kim, Kyehyun Kim, Younghui Kim, and Junyong Noh. 2016. Rich360: optimized spherical representation from structured panoramic camera arrays. ACM Trans. on Graph. 35, 4 (2016), 63. Google ScholarDigital Library
18. Stephen Robert Marschner and Donald P Greenberg. 1998. Inverse rendering for computer graphics. Cornell University.Google ScholarDigital Library
19. Kevin Matzen, Michael F Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski. 2017. Low-cost 360 stereo photography and video capture. ACM Trans. on Graph. 36, 4 (2017), 148. Google ScholarDigital Library
20. Athanasios Papoulis. 1977. Signal analysis. Vol. 191. McGraw-Hill New York.Google Scholar
21. Jackson Pope, David Creasey, and Alan Chalmers. 1999. Realtime Room Acoustics Using Ambisonics. In Audio Engineering Society Conference: 16th International Conference: Spatial Sound Reproduction.Google Scholar
22. N. Raghuvanshi, R. Narain, and M. C. Lin. 2009. Efficient and Accurate Sound Propagation Using Adaptive Rectangular Decomposition. IEEE Transactions on Visualization and Computer Graphics 15, 5 (2009), 789–801. Google ScholarDigital Library
23. Nikunj Raghuvanshi and John Snyder. 2014. Parametric Wave Field Coding for Pre-computed Sound Propagation. ACM Trans. on Graph. 33, 4, Article 38 (July 2014), 11 pages. Google ScholarDigital Library
24. Nikunj Raghuvanshi, John Snyder, Ravish Mehra, Ming Lin, and Naga Govindaraju. 2010. Precomputed Wave Simulation for Real-time Sound Propagation of Dynamic Sources in Complex Scenes. ACM Trans. on Graph. 29, 4, Article 68 (2010), 68:1–68:11 pages. Google ScholarDigital Library
25. Zhimin Ren, Hengchin Yeh, and Ming C Lin. 2013. Example-guided physically based modal sound synthesis. ACM Trans. on Graph. 32, 1 (2013), 1. Google ScholarDigital Library
26. Steve Rubin, Floraine Berthouzoz, Gautham J. Mysore, Wilmot Li, and Maneesh Agrawala. 2013. Content-based Tools for Editing Audio Stories. In Proc. UIST ’13. 113–122. Google ScholarDigital Library
27. Lauri Savioja and U. Peter Svensson. 2015. Overview of geometrical room acoustic modeling techniques. The Journal of the Acoustical Society of America 138, 2 (2015), 708–730.Google ScholarCross Ref
28. Carl Schissler, Christian Loftin, and Dinesh Manocha. 2017a. Acoustic Classification and Optimization for Multi-Modal Rendering of Real-World Scenes. IEEE Transactions on Visualization and Computer Graphics (2017).Google Scholar
29. Carl Schissler, Ravish Mehra, and Dinesh Manocha. 2014. High-order Diffraction and Diffuse Reflections for Interactive Sound Propagation in Large Environments. ACM Trans. Graph. 33, 4, Article 39 (July 2014), 12 pages. Google ScholarDigital Library
30. Carl Schissler, Aaron Nicholls, and Ravish Mehra. 2016. Efficient HRTF-based spatial audio for area and volumetric sources. IEEE transactions on visualization and computer graphics 22, 4 (2016), 1356–1366. Google ScholarDigital Library
31. Carl Schissler, Peter Stirling, and Ravish Mehra. 2017b. Efficient construction of the spatial room impulse response. In Virtual Reality (VR), 2017 IEEE. IEEE, 122–130.Google ScholarCross Ref
32. Efstathios Stavrakis, Nicolas Tsingos, and Paul Calamia. 2008. Topological Sound Propagation with Reverberation Graphs. Acta Acustica/Acustica – the Journal of the European Acoustics Association (EAA) (2008).Google Scholar
33. James Traer and Josh H. McDermott. 2016. Statistics of natural reverberation enable perceptual separation of sound and space. Proceedings of the National Academy of Sciences 113, 48 (2016), E7856–E7865. arXiv:http://www.pnas.org/content/113/48/E7856.full.pdfGoogle ScholarCross Ref
34. Nicolas Tsingos. 2009. Precomputing Geometry-Based Reverberation Effects for Games. In Audio Engineering Society Conference: 35th International Conference: Audio for Games. http://www.aes.org/e-lib/browse.cfm?elib=15164Google Scholar
35. Nicolas Tsingos, Thomas Funkhouser, Addy Ngan, and Ingrid Carlbom. 2001. Modeling Acoustics in Virtual Environments Using the Uniform Theory of Diffraction. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’01). ACM, New York, NY, USA, 545–552. Google ScholarDigital Library
36. Michael Vorländer. 2008. Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality (RWTHedition) (2008 ed.). Springer.Google Scholar
37. Ciyou Zhu, Richard H. Byrd, Peihuang Lu, and Jorge Nocedal. 1997. Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-scale Bound-constrained Optimization. ACM Trans. Math. Softw. 23, 4 (Dec. 1997), 550–560. Google ScholarDigital Library
38. Franz Zotter, Hannes Pomberger, and Matthias Frank. 2009. An alternative ambisonics formulation: Modal source strength matching and the effect of spatial aliasing. In Audio Engineering Society Convention 126. Audio Engineering Society.Google Scholar

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2018: Technical Papers

“Scene-aware audio for 360° videos” by Li, Langlois and Zheng

Conference:

Type(s):

Entry Number: 111

Title:

Session/Category Title: Sounds Good!

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: