Example‐Guided Physically Based Modal Sound Synthesis

Linear modal synthesis methods have often been used to generate sounds for rigid bodies. One of the key challenges in widely adopting such techniques is the lack of automatic determination of satisfactory material parameters that recreate realistic audio quality of sounding materials. We introduce a novel method using prerecorded audio clips to estimate material parameters that capture the inherent quality of recorded sounding materials. Our method extracts perceptually salient features from audio examples. Based on psychoacoustic principles, we design a parameter estimation algorithm using an optimization framework and these salient features to guide the search of the best material parameters for modal synthesis. We also present a method that compensates for the differences between the real-world recording and sound synthesized using solely linear modal synthesis models to create the final synthesized audio. The resulting audio generated from this sound synthesis pipeline well preserves the same sense of material as a recorded audio example. Moreover, both the estimated material parameters and the residual compensation naturally transfer to virtual objects of different sizes and shapes, while the synthesized sounds vary accordingly. A perceptual study shows the results of this system compare well with real-world recordings in terms of material perception.

References:

Adrien, J.-M. 1991. The missing link: Modal synthesis. In Representations of Musical Signals, MIT Press, Cambridge, MA, 269–298.
Audiokinetic. 2011. Wwise SoundSeed Impact. http://www.audiokinetic. com/en/products/wwise-add-ons/soundseed/introduction
Besl, P. J. and McKay, N. D. 1992. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 239–256.
Bonneel, N., Drettakis, G., Tsingos, N., Viaud-Delmon, I., and James, D. 2008. Fast modal sounds with scalable frequency-domain synthesis. ACM Trans. Graph. 27, 3, 24.
Chadwick, J. N., An, S. S., and James, D. L. 2009. Harmonic shells: A practical nonlinear sound model for near-rigid thin shells. In Proceedings of the SIGGRAPH Asia ’09: ACM SIGGRAPH Asia Papers. ACM, New York, 1–10.
Chadwick, J. N. and James, D. L. 2011. Animating fire with sound. ACM Trans. Graph. 30, 84.
Cook, P. R. 1996. Physically informed sonic modeling (PhISM): percussive synthesis. In Proceedings of the International Computer Music Conference. The International Computer Music Association, 228–231.
Cook, P. R. 1997. Physically informed sonic modeling (phism): Synthesis of percussive sounds. Comput. Music J. 21, 3, 38–49.
Cook, P. R. 2002. Real Sound Synthesis for Interactive Applications. A. K. Peters, Ltd., Natick, MA.
Corbett, R., van den Doel, K., Lloyd, J. E., and Heidrich, W. 2007. Timbrefields: 3d interactive sound models for real-time audio. Presence: Teleoper. Virtual Environ. 16, 6, 643–654.
Dobashi, Y., Yamamoto, T., and Nishita, T. 2003. Real-Time rendering of aerodynamic sound using sound textures based on computational fluid dynamics. ACM Trans. Graph. 22, 3, 732–740.
Dobashi, Y., Yamamoto, T., and Nishita, T. 2004. Synthesizing sound from turbulent field using sound textures for interactive fluid simulation. Comput. Graph. Forum 23, 539–545.
Dubuisson, M. P. and Jain, A. K. 1994. A modified hausdorff distance for object matching. In Proceedings of the 12th International Conference on Pattern Recognition. Vol. 1, IEEE Computer Society Press, 566–568.
Fontana, F. 2003. The sounding object. In Mondo Estremo.
Gope, C. and Kehtarnavaz, N. 2007. Affine invariant comparison of point-sets using convex hulls and hausdorff distances. Pattern Recogn. 40, 1, 309–320.
Griffin, D. and Lim, J. 2003. Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32, 2, 236–243.
ISO. 2003. ISO 226: 2003: AcousticsNormal equalloudness-level contours. International Organization for Standardization.
James, D., Barbič, J., and Pai, D. 2006. Precomputed acoustic transfer: Output-sensitive, accurate sound generation for geometrically complex vibration sources. In Proceedings of the ACM SIGGRAPH ’06 Papers. ACM, 995.
Lagarias, J. C., Reeds, J. A., Wright, M. H., and Wright, P. E. 1999. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J. Optim. 9, 1, 112–147.
Lakatos, S., Mcadams, S., and Caussé, R. 1997. The representation of auditory source characteristics: Simple geometric form. Atten., Percept. Psychophys. 59, 8, 1180–1190.
Levine, S. N., Verma, T. S., and Smith, J. O. 1998. Multiresolution sinusoidal modeling for wideband audio with modifications. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vol. 6, IEEE, 3585–3588.
Lloyd, D. B., Raghuvanshi, N., and Govindaraju, N. K. 2011. Sound synthesis for impact sounds in video games. In Proceedings of the Symposium on Interactive 3D Graphics and Games.
Morchen, F., Ultsch, A., Thies, M., and Lohken, I. 2006. Modeling timbre distance with temporal statistics from polyphonic music. IEEE Trans. Audio, Speech Lang. Process. 14, 1, 81–90.
Moss, W., Yeh, H., Hong, J., Lin, M., and Manocha, D. 2010. Sounding liquids: Automatic sound synthesis from fluid simulation. ACM Trans. Graph.
O’Brien, J. F., Cook, P. R., and Essl, G. 2001. Synthesizing sounds from physically based motion. In Proceedings of the ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. ACM Press, 529–536.
O’Brien, J. F., Shen, C., and Gatchalian, C. M. 2002. Synthesizing sounds from rigid-body simulations. In Proceedings of the ACM SIGGRAPH Symposium on Computer Animation. ACM Press, 175–181.
Oppenheim, A. V., Schafer, R. W., and Buck, J. R. 1989. Discrete-Time Signal Processing. Vol. 1999, Prentice Hall, Englewood Cliffs, NJ.
Pai, D. K., Doel, K. V. D., James, D. L., Lang, J., Lloyd, J. E., Richmond, J. L., and Yau, S. H. 2001. Scanning physical interaction behavior of 3d objects. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’01). ACM, New York, 87–96.
Pampalk, E., Rauber, A., and Merkl, D. 2002. Content-Based organization and visualization of music archives. In Proceedings of the 10th ACM International Conference on Multimedia. ACM, 570–579.
Picard, C., Tsingos, N., and Faure, F. 2009. Retargetting example sounds to interactive physics-driven animations. In Proceedings of the AES 35th International Conference-Audio for Games.
Quatieri, T. and McAulay, R. 1985. Speech transformations based on a sinusoidal representation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’85). Vol. 10, 489–492.
Raghuvanshi, N. and Lin, M. 2006. Symphony: Real-Time physically-based sound synthesis. In Proceedings of the Symposium on Interactive 3D Graphics and Games.
Ren, Z., Mehra, R., Coposky, J., and Lin, M. C. 2012. Tabletop ensemble: touch-enabled virtual percussion instruments. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D ’12). ACM, New York, 7–14.
Ren, Z., Yeh, H., and Lin, M. 2010. Synthesizing contact sounds between textured models. In Proceedings of the IEEE Virtual Reality Conference (VR’10). 139–146.
Ren, Z., Yeh, H., Klatzky, R., and Lin, M. C. 2013. Geometry-Invariant material perception: Analysis and evaluation of Rayleigh damping model. IEEE Trans. Visualiz. Comput. Graph. 19, 4 (Special Issue VR’13).
Roads, C. 2004. Microsound. The MIT Press.
Serra, X. 1997. Musical sound modeling with sinusoids plus noise. In Musical Signal Processing, 497–510.
Serra, X. and Smith III, J. 1990. Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition. Comput. Music J. 14, 4, 12–24.
Shabana, A. 1997. Vibration of Discrete and Continuous Systems. Springer Verlag.
Trebien, F. and Oliveira, M. 2009. Realistic real-time sound resynthesis and processing forinteractive virtual worlds. Vis. Comput. 25, 469– 477.
Välimäki, V., Huopaniemi, J., Karjalainen, M., and Jánosy, Z. 1996. Physical modeling of plucked string instruments with application to real-time sound synthesis. J. Audio Engin. Soc. 44, 5, 331–353.
Välimäki, V. and Tolonen, T. 1997. Development and calibration of a guitar synthesizer. Audio Visual Society.
van den Doel, K., Knott, D., and Pai, D. K. 2004. Interactive simulation of complex audiovisual scenes. Presence: Teleoper. Virtual Environ. 13, 99–111.
van den Doel, K., Kry, P., and Pai, D. 2001. FoleyAutomatic: Physically-Based sound effects for interactive simulation and animation. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, 537–544.
van den Doel, K. and Pai, D. K. 1998. The sounds of physical shapes. Presence: Teleoper. Virtual Environ. 7, 382–395.
Van Den Doel, K. and Pai, D. K. 2002. Measurements of perceptual quality of contact sound models. In Proceedings of the International Conference on Auditory Display (ICAD ’02). 345–349.
Zheng, C. and James, D. L. 2009. Harmonic fluids. In Proceedings of SIGGRAPH ’09: ACM SIGGRAPH Papers. ACM, New York, 1–12.
Zheng, C. and James, D. L. 2010. Rigid-Body fracture sound with precomputed soundbanks. ACM Trans. Graph. 29, 69:1–69:13.
Zheng, C. and James, D. L. 2011. Toward high-quality modal contact sound. ACM Trans. Graph. 30, 4.
Zwicker, E. and Fastl, H. 1999. Psychoacoustics: Facts and Models 2nd Ed. Springer, New York.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2013: Technical Papers

“Example‐Guided Physically Based Modal Sound Synthesis” by Ren, Yeh and Lin

Conference:

Type:

Title:

Session/Category Title: Sounds & Solids

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: