“Fast modal sounds with scalable frequency-domain synthesis” by Bonneel, Drettakis, Tsingos, Viaud-Delmon and James

  • ©Nicolas Bonneel, George Drettakis, Nicolas Tsingos, Isabelle Viaud-Delmon, and Doug L. James




    Fast modal sounds with scalable frequency-domain synthesis



    Audio rendering of impact sounds, such as those caused by falling objects or explosion debris, adds realism to interactive 3D audiovisual applications, and can be convincingly achieved using modal sound synthesis. Unfortunately, mode-based computations can become prohibitively expensive when many objects, each with many modes, are impacted simultaneously. We introduce a fast sound synthesis approach, based on short-time Fourier Tranforms, that exploits the inherent sparsity of modal sounds in the frequency domain. For our test scenes, this “fast mode summation” can give speedups of 5–8 times compared to a time-domain solution, with slight degradation in quality. We discuss different reconstruction windows, affecting the quality of impact sound “attacks”. Our Fourier-domain processing method allows us to introduce a scalable, real-time, audio processing pipeline for both recorded and modal sounds, with auditory masking and sound source clustering. To avoid abrupt computation peaks, such as during the simultaneous impacts of an explosion, we use crossmodal perception results on audiovisual synchrony to effect temporal scheduling. We also conducted a pilot perceptual user evaluation of our method. Our implementation results show that we can treat complex audiovisual scenes in real time with high quality.


    1. Alais, D., and Carlile, S. 2005. Synchronizing to real events: subjective audiovisual alignment scales with perceived auditory depth and speed of sound. Proc Natl Acad Sci 102, 6, 2244–7.Google ScholarCross Ref
    2. Begault, D. 1999. Auditory and non-auditory factors that potentially influence virtual acoustic imagery. In Proc. AES 16th Int. Conf. on Spatial Sound Reproduction, 13–26.Google Scholar
    3. Fujisaki, W., Shimojo, S., Kashino, M., and Nishida, S. 2004. Recalibration of audiovisual simultaneity. Nature Neuroscience 7, 7, 773–8.Google ScholarCross Ref
    4. Guski, R., and Troje, N. 2003. Audiovisual phenomenal causality. Perception and Psychophysics 65, 5, 789–800.Google ScholarCross Ref
    5. Hormander, L. 1983. The Analysis of Linear Partial Differential Operators I. Springer-Verlag.Google Scholar
    6. Howell, D. C. 1992. Statistical Methods for Psychology. PWS-Kent.Google Scholar
    7. ITU. 2001–2003. Method for the subjective assessment of intermediate quality level of coding systems, rec. ITU-R BS.1534-1, http://www.itu.int/.Google Scholar
    8. James, D. L., Barbic, J., and Pai, D. K. 2006. Precomputed acoustic transfer: Output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Transactions on Graphics (ACM SIGGRAPH) 25, 3 (July), 987–995. Google ScholarDigital Library
    9. Larsson, P., Västfjäll, D., and Kleiner, M. 2002. Better presence and performance in virtual environments by improved binaural sound rendering. Proc. AES 22nd Intl. Conf. on virtual, synthetic and entertainment audio (June), 31–38.Google Scholar
    10. Moeck, T., Bonneel, N., Tsingos, N., Drettakis, G., Viaud-Delmon, I., and Aloza, D. 2007. Progressive perceptual audio rendering of complex scenes. In ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D), 189–196. Google ScholarDigital Library
    11. O’Brien, J. F., Shen, C., and Gatchalian, C. M. 2002. Synthesizing sounds from rigid-body simulations. In ACM SIGGRAPH Symp. on Computer Animation, 175–181. Google ScholarDigital Library
    12. Oppenheim, A. V., Schafer, R. W., and Buck, J. R. 1999. Discrete-Time Signal Processing (2nd edition). Prentice-Hall. Google ScholarDigital Library
    13. Pai, D. K., van Den Doel, K., James, D. L., Lang, J., Lloyd, J. E., Richmond, J. L., and Yau, S. H. 2001. Scanning physical interaction behavior of 3d objects. In Proc. ACM SIGGRAPH 2001, 87–96. Google ScholarDigital Library
    14. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical recipes in C: The art of scientific computing. Cambridge University Press. Google ScholarDigital Library
    15. Raghuvanshi, N., and Lin, M. C. 2006. Interactive sound synthesis for large scale environments. In ACM SIGGRAPH Symp. on Interactive 3D Graphics and Games (I3D), 101–108. Google ScholarDigital Library
    16. Rodet, X., and Depalle, P. 1992. Spectral envelopes and inverse FFT synthesis. In Proc. 93rd Conv. AES, San Francisco.Google Scholar
    17. Sekuler, R., Sekuler, A. B., and Lau, R. 1997. Sound alters visual motion perception. Nature 385, 6614, 308.Google ScholarCross Ref
    18. Sugita, Y., and Suzuki, Y. 2003. Audiovisual perception: Implicit estimation of sound-arrival time. Nature 421, 6926, 911.Google ScholarCross Ref
    19. Tsingos, N., Gallo, E., and Drettakis, G. 2004. Perceptual audio rendering of complex virtual environments. ACM Transactions on Graphics (ACM SIGGRAPH) 23, 3 (July), 249–258. Google ScholarDigital Library
    20. Tsingos, N. 2005. Scalable perceptual mixing and filtering of audio signals using an augmented spectral representation. In Proc. Int. Conf. on Digital Audio Effects, 277–282.Google Scholar
    21. Van Den Doel, K., and Pai, D. K. 1998. The sounds of physical shapes. Presence 7, 4, 382–395. Google ScholarDigital Library
    22. Van Den Doel, K., and Pai, D. K. 2003. Modal synthesis for vibrating objects. Audio Anecdotes.Google Scholar
    23. Van Den Doel, K., Kry, P. G., and Pai, D. K. 2001. FoleyAutomatic: Physically-based sound effects for interactive simulation and animation. In Proc. ACM SIGGRAPH 2001, 537–544. Google ScholarDigital Library
    24. Van Den Doel, K., Pai, D. K., Adam, T., Kortchmar, L., and Pichora-Fuller, K. 2002. Measurements of perceptual quality of contact sound models. Intl. Conf. on Auditory Display, (ICAD), 345–349.Google Scholar
    25. Van Den Doel, K., Knott, D., and Pai, D. K. 2004. Interactive simulation of complex audiovisual scenes. Presence: Teleoperators and Virtual Environments 13, 1, 99–111. Google ScholarDigital Library
    26. Zölzer, U. 2002. Digital Audio Effects (DAFX), chapter 8. Wiley.Google Scholar

ACM Digital Library Publication: