“Real-time rendering of decorative sound textures for soundscapes” by Zheng, Hung, Hiebel and Zhang – ACM SIGGRAPH HISTORY ARCHIVES

“Real-time rendering of decorative sound textures for soundscapes” by Zheng, Hung, Hiebel and Zhang

  • 2020 SA Technical Papers_Zheng_Real-time rendering of decorative sound textures for soundscapes

Conference:


Type(s):


Title:

    Real-time rendering of decorative sound textures for soundscapes

Session/Category Title:   VR and Real-time Techniques


Presenter(s)/Author(s):



Abstract:


    Audio recordings contain rich information about sound sources and their properties such as the location, loudness, and frequency of events. One prevalent component in sound recordings is the sound texture, which contains a massive number of events. In such a texture, there can be some distinct and repeated sounds that we term as a foreground sound. Birds chirping in the wind is one such decorative sound texture with the chirping as a foreground sound and the wind as a background texture. To render these decorative sound textures in real-time and with high quality, we create two-layer Markov Models to enable smooth transitions from sound grain to sound grain and propose a hierarchical scheme to generate Head-Related Transfer Function filters for localization cues of sounds represented as area/volume sources. Moreover, during the synthesis stage, we provide control over the frequency and intensity of sounds for customization. Lastly, foreground sounds are often blended into background textures such as the sound of rain splats on car surfaces becoming submerged in the background rain. We develop an extraction component that outperforms existing learning-based methods to facilitate our synthesis with perceptible foreground sounds and well-defined textures.

References:


    1. V Ralph Algazi, Richard O Duda, Dennis M Thompson, and Carlos Avendano. 2001. The CIPIC HRTF database. In Applications of Signal Processing to Audio and Acoustics. IEEE, 2001 IEEE Workshop, 99–102.Google Scholar
    2. Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and multimedia. (2000).Google Scholar
    3. Juan Pablo Bello, Laurent Daudet, Samer Abdallah, Chris Duxbury, Mike Davies, and Mark B Sandler. 2005. A tutorial on onset detection in music signals. IEEE Transactions on speech and audio processing 13, 5 (2005), 1035–1047.Google ScholarCross Ref
    4. Steven Boll. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on acoustics, speech, and signal processing 27, 2 (1979), 113–120.Google ScholarCross Ref
    5. Joan Bruna and Stéphane Mallat. 2013. Audio texture synthesis with scattering moments. arXiv preprint arXiv:1311.0407 (2013).Google Scholar
    6. Nicholas Bryan and Gautham Mysore. 2013. An efficient posterior regularized latent variable model for interactive sound source separation. In International Conference on Machine Learning. 208–216.Google Scholar
    7. Chunxiao Cao, Zhong Ren, Carl Schissler, Dinesh Manocha, and Kun Zhou. 2016. Interactive sound propagation with bidirectional path tracing. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1–11.Google ScholarDigital Library
    8. Jeffrey N Chadwick and Doug L James. 2011. Animating fire with sound. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 84.Google ScholarDigital Library
    9. Abe Davis and Maneesh Agrawala. 2018. Visual Rhythm and Beat. ACM Trans. Graph. 37, 4 (2018), 122–1.Google ScholarDigital Library
    10. Antonio Di Crescenzo and Maria Longobardi. 2009. On cumulative entropies. Journal of Statistical Planning and Inference 139, 12 (2009), 4072–4087.Google ScholarCross Ref
    11. Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017).Google Scholar
    12. Michael J Evans, James AS Angus, and Anthony I Tew. 1998. Analyzing head-related transfer function measurements using surface spherical harmonics. The Journal of the Acoustical Society of America 104, 4 (1998), 2400–2411.Google ScholarCross Ref
    13. Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta informatica 4, 1 (1974), 1–9.Google Scholar
    14. Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. 411–412.Google ScholarDigital Library
    15. Fabio P Freeland, Luiz WP Biscainho, and Paulo SR Diniz. 2002. Efficient HRTF interpolation in 3D moving sound. In Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio. Audio Engineering Society.Google Scholar
    16. Hannes Gamper. 2013. Head-related transfer function interpolation in azimuth, elevation, and distance. The Journal of the Acoustical Society of America 134, 6 (2013), EL547–EL553.Google ScholarCross Ref
    17. Aki Härmä, Julia Jakka, Miikka Tikander, Matti Karjalainen, Tapio Lokki, Jarmo Hiipakka, and Gaëtan Lorho. 2004. Augmented reality audio for mobile and wearable appliances. Journal of the Audio Engineering Society 52, 6 (2004), 618–639.Google Scholar
    18. Toni Heittola, Annamaria Mesaros, Dani Korpi, Antti Eronen, and Tuomas Virtanen. 2014. Method for creating location-specific audio textures. EURASIP Journal on Audio, Speech, and Music Processing 2014, 1 (2014), 9.Google ScholarCross Ref
    19. Alexander JE Kell and Josh H McDermott. 2019. Invariance to background noise as a signature of non-primary auditory cortex. Nature communications 10, 1 (2019), 1–11.Google Scholar
    20. Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra. 2005. Texture optimization for example-based synthesis. In ACM SIGGRAPH 2005 Papers. 795–802.Google ScholarDigital Library
    21. Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. 2003. Graphcut textures: image and video synthesis using graph cuts. ACM Transactions on Graphics (ToG) 22, 3 (2003), 277–286.Google ScholarDigital Library
    22. Wei-Hsiang Liao, Axel Roebel, and Alvin Su. 2013. On the modeling of sound textures based on the STFT representation. In Proc. of the 16th Int. Conference on Digital Audio Effects (DAFx-13). 33.Google Scholar
    23. Shiguang Liu, Haonan Cheng, and Yiying Tong. 2019. Physically-based statistical simulation of rain sound. ACM Transactions on Graphics (TOG) 38, 4 (2019), 123.Google ScholarDigital Library
    24. Josh H McDermott and Eero P Simoncelli. 2011. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 5 (2011), 926–940.Google ScholarCross Ref
    25. Brian McFee, Justin Salamon, and Juan Pablo Bello. 2018. Adaptive pooling operators for weakly labeled sound event detection. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26, 11 (2018), 2180–2193.Google ScholarDigital Library
    26. Ian McLoughlin, Haomin Zhang, Zhipeng Xie, Yan Song, and Wei Xiao. 2015. Robust sound event classification using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 3 (2015), 540–552.Google ScholarDigital Library
    27. Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010).Google Scholar
    28. Sean O’Leary and Axel Roebel. 2014. A two level montage approach to sound texture synthesis with treatment of unique events.. In DAFx. 1–1.Google Scholar
    29. Seán O’Leary and Axel Röbel. 2016. A montage approach to sound texture synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 6 (2016), 1094–1105.Google ScholarDigital Library
    30. Ashish Panda and Thambipillai Srikanthan. 2011. Psychoacoustic model compensation for robust speaker verification in environmental noise. IEEE transactions on audio, speech, and language processing 20, 3 (2011), 945–953.Google Scholar
    31. David R Perrott and Kourosh Saberi. 1990. Minimum audible angle thresholds for sources varying in both elevation and azimuth. The Journal of the Acoustical Society of America 87, 4 (1990), 1728–1731.Google ScholarCross Ref
    32. Emil Praun, Adam Finkelstein, and Hugues Hoppe. 2000. Lapped textures. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 465–470.Google ScholarDigital Library
    33. Lawrence R Rabiner and Ronald W Schafer. 2011. Theory and applications of digital speech processing. Vol. 64. Pearson Upper Saddle River, NJ.Google Scholar
    34. Boaz Rafaely and Amir Avni. 2010. Interaural cross correlation in a sound field represented by spherical harmonics. The Journal of the Acoustical Society of America 127, 2 (2010), 823–828.Google ScholarCross Ref
    35. Nikunj Raghuvanshi, Rahul Narain, and Ming C Lin. 2009. Efficient and accurate sound propagation using adaptive rectangular decomposition. IEEE Transactions on Visualization and Computer Graphics 15, 5 (2009), 789–801.Google ScholarDigital Library
    36. Nikunj Raghuvanshi and John Snyder. 2018. Parametric directional coding for pre-computed sound propagation. ACM Transactions on Graphics (TOG) 37, 4 (2018), 108.Google ScholarDigital Library
    37. Curtis Roads. 1988. Introduction to granular synthesis. Computer Music Journal 12, 2 (1988), 11–13.Google ScholarCross Ref
    38. Griffin D Romigh, Douglas S Brungart, Richard M Stern, and Brian D Simpson. 2015. Efficient real spherical harmonic representation of head-related transfer functions. IEEE Journal of Selected Topics in Signal Processing 9, 5 (2015), 921–930.Google ScholarCross Ref
    39. Nicolas Saint-Arnaud and Kris Popat. 1995. Analysis and synthesis of sound textures. In in Readings in Computational Auditory Scene Analysis. Citeseer.Google Scholar
    40. Carl Schissler, Ravish Mehra, and Dinesh Manocha. 2014. High-order diffraction and diffuse reflections for interactive sound propagation in large environments. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1–12.Google ScholarDigital Library
    41. Carl Schissler, Aaron Nicholls, and Ravish Mehra. 2016. Efficient HRTF-based spatial audio for area and volumetric sources. IEEE transactions on visualization and computer graphics 22, 4 (2016), 1356–1366.Google ScholarDigital Library
    42. Diemo Schwarz. 2011. State of the art in sound texture synthesis. In Digital Audio Effects (DAFx). 221–232.Google Scholar
    43. Diemo Schwarz and Baptiste Caramiaux. 2013. Interactive sound texture synthesis through semi-automatic user annotations. In International Symposium on Computer Music Multidisciplinary Research. Springer, 372–392.Google Scholar
    44. Mincheol Shin, Stephen W Song, Se Jung Kim, and Frank Biocca. 2019. The effects of 3D sound in a 360-degree live concert video on social presence, parasocial interaction, enjoyment, and intent of financial supportive action. International Journal of Human-Computer Studies 126 (2019), 81–93.Google ScholarDigital Library
    45. Paris Smaragdis, Bhiksha Raj, and Madhusudana Shashanka. 2006. A probabilistic latent variable model for acoustic modeling. (2006).Google Scholar
    46. Martin Spiertz and Volker Gnann. 2009. Source-filter based clustering for monaural blind source separation. In Proceedings of the 12th International Conference on Digital Audio Effects.Google Scholar
    47. Yapeng Tian, Chenliang Xu, and Dingzeyu Li. 2019. Deep Audio Prior. ArXiv abs/1912.10292 (2019).Google Scholar
    48. Andries Van Der Merwe and Walter Schulze. 2010. Music generation with markov models. IEEE MultiMedia 18, 3 (2010), 78–85.Google ScholarDigital Library
    49. Charles Verron, Mitsuko Aramaki, Richard Kronland-Martinet, and Grégory Pallone. 2009. Spatialized synthesis of noisy environmental sounds. In Auditory Display. Springer, 392–407.Google Scholar
    50. Jui-Hsien Wang, Ante Qu, Timothy R Langlois, and Doug L James. 2018. Toward wave-based sound synthesis for computer animation. ACM Trans. Graph. 37, 4 (2018), 109–1.Google ScholarDigital Library
    51. Stephan Wenger and Marcus Magnor. 2011. Constrained example-based audio synthesis. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 1–6.Google ScholarDigital Library
    52. Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2018. Ambient sound propagation. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–10.Google ScholarDigital Library
    53. Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2019. Acoustic texture rendering for extended sources in complex scenes. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–9.Google ScholarDigital Library
    54. Changxi Zheng and Doug L James. 2009. Harmonic fluids. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 37.Google ScholarDigital Library
    55. Xinglei Zhu and Lonce Wyse. 2004. Sound texture modeling and time-frequency LPC. In Proceedings of the 7th international conference on digital audio effects DAFX, Vol. 4.Google Scholar


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org