“Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoencoder” by Yamamoto and Igarashi
Conference:
Type(s):
Title:
- Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoencoder
Session/Category Title: AR/VR
Presenter(s)/Author(s):
Abstract:
To realize 3D spatial sound rendering with a two-channel headphone, one needs head-related transfer functions (HRTFs) tailored for a specific user. However, measurement of HRTFs requires a tedious and expensive procedure. To address this, we propose a fully perceptual-based HRTF fitting method for individual users using machine learning techniques. The user only needs to answer pairwise comparisons of test signals presented by the system during calibration. This reduces the efforts necessary for the user to obtain individualized HRTFs. Technically, we present a novel adaptive variational AutoEncoder with a convolutional neural network. In the training, this AutoEncoder analyzes publicly available HRTFs dataset and identifies factors that depend on the individuality of users in a nonlinear space. In calibration, the AutoEncoder generates high-quality HRTFs fitted to a specific user by blending the factors. We validate the feasibilities of our method through several quantitative experiments and a user study.
References:
1. V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. 2001. The CIPIC HRTF Database. In IEEE Workshop on Applications of Signal Processing to Audio and Electroacoustics. 99–102.
2. P. Bilinski, J. Ahrens, M. Thomas, I. Tashev, and J. Platt. 2014. HRTF magnitude synthesis via sparse representation of anthropometric features. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Proces.
3. Eric Brochu, Tyson Brochu, and Nando de Freitas. 2010. A Bayesian Interactive Optimization Approach to Procedural Animation Design. In Proc. of ACM SCA. 103–112.
4. Xuefeng Chen, Xiabi Liu, and Yunde Jia. 2009. Combining Evolution Strategy and Gradient Descent Method for Discriminative Learning of Bayesian Classifiers. In Proc. of Genetic and Evolutionary Computation. 507–514.
5. Djork-Arne Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2016. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). In Proc. of ICLR.
6. Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or — 1. In arXiv.
7. R. Duraiswaini, D.N. Zotkin, and N.A. Gumerov. 2004. Interpolation and range extrapolation of HRTFs {head related transfer functions}. In ICASSP.
8. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethges. 2016. Image Style Transfer Using Convolutional Neural Networks. In Proc. of IEEE CVPR. Cross Ref
9. Felipe Grijalva, Luiz Martini, Siome Goldenstein, and Dinei Florencio. 2014. Anthropometric-Based Customization of Head-Related Transfer Functions using Isomap in The Horizontal Plane. In ICASSP.
10. Nail A. Gumerov, Adam E. O’ Donovan, Ramani Duraiswami, and Dmitry N. Zotkin. 2010. Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation. In J. Acoust Soc. Am, Vol. 127. Cross Ref
11. N Hansen, SD Muller, and P Koumoutsakos. 2003. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). In Evolutionary Computation. 1–18.
12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In Proc. of CVPR.
13. Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Transaction on Graphics (SIGGRAPH), 35, 4 (2016), 138:1–138:11.
14. Josef Holzl. 2014. A Global Model for HRTF Individualization by Adjustment of Principal Component Weights. In Diploma Thesis.
15. Hongmei Hu, Lin Zhou, Hao Ma, and Zhenyang Wu. 2008. HRTF personalization based on artificial neural net- work in individual virtual auditory space. In Applied Acoustics, Vol. 69. 163–172. Cross Ref
16. Q Huang and Y Fang. 2009. Modeling personalized head- related impulse response using support vector regressions. In J. Shanghai Univ.
17. Q. Huang and Q. Zhuang. 2009. HRIR personalisation using support vector regression in independent feature space. In Electron. Letter, Vol. 45.
18. PK. Iida, Y. Ishii, and S. Nishioka. 2014. Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener’s pinnae. In J. Acoust Soc. Am.
19. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. of ICML.
20. Craig T. Jin, Pierre Guillon, Nicolas Epain, Reza Zolfaghari, Andre van Schaik, Anthony I. Tew, Carl Hetherington, and Jonathan Thorpe. 2014. Creating the Sydney York Morphological and Acoustic Recordings of Ears Database. In IEEE Transactions on Multimedia, Vol. 16. Cross Ref
21. Y. Kahana and P. A. Nelson. 2007. Boundary element simulations of the transfer function of human heads and baffled pinnae using accurate geometric model. In Journal of sound and vibration. 552–579.
22. Shoken Kaneko, Tsukasa Suenaga, and Satoshi Sekine. 2016. DeepEarNet: individualizing spatial audio with photography, ear shape modeling, and neural networks. In AES Conference on Audio for Virtual and Augmented Reality.
23. B. F. Katz. 2001. Boundary element method calculation of individual head-related transfer function. i. rigid model calculation. In J. Acoust Soc. Am.
24. Kingma and Diederik P. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems.
25. D Kingma and J P Ba. 2014. Adam: A method for stochastic optimization. In CoRR abs/1412.6980.
26. Diederik P Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proc. of ICLR.
27. Yehuda Koren, Rovert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. In IEEE Computer, Vol. 42. IEEE, 30–37.
28. Yuki Koyama, Daisuke Sakamoto, and Takeo Igarashi. 2014. Crowd-powered parameter analysis for visual design exploration. In Proc. of ACM UIST. 65–74.
29. E.H.A. Langendijk and A.W. Bronkhorst. 2000. Fidelity of three-dimensional-sound reproduction using a virtual auditory display. In J. Acoust. Soc. Am. Cross Ref
30. Yuancheng Luo, Dmitry N. Zotkin, Hal Daume, and Ramani Duraiswami. 2013b. Kernel regression for Head-Related Transfer Function interpolation and spectral extrema extraction. In ICASSP.
31. Yuancheng Luo, Dmitry N. Zotkin, and Ramani Duraiswami. 2013a. Virtual AutoEncoder Based Recommendation System for Individualizing Head-Related Transfer Functions. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
32. G Matheron. 1963. Principles of geostatistics. In Economic Geology. 1246–1266.
33. Noriyuki Matsunaga and Tatsuya Hirahara. 2010. Reexamination of fast head-related transfer function measurement by reciprocal method. In J. Acoust Soc. Ja, Vol. 31, 6.
34. Alok Meshram, Ravish Mehra, and Dinesh Manocha. 2014. Efficient HRTF Computation using Adaptive Rectangular Decomposition. In AES 55th International Conference.
35. J.C Middlebrooks. 1999. Virtual localization improved by scaling non-individualized external-ear transfer functions in frequency. In J. Acoust. Soc. Am. 106.
36. P. Mokhtari, H Takemoto, R. Nishimura, and H. Kato. 2008. Computer simulation of hrtfs for personalization of 3d audio. In In Universal Communication, IEEE. ISUC ’08. Second International Symposium. 435–440.
37. P. Mokhtari, H Takemoto, R. Nishimura, and H. Kato. 2010. Computer simulation of kemar’s head-related transfer functions: verification with measurements and acoustic effects of modifying head shape and pinna concavity. In Principles and Applications of Spatial Hearing. 179–194.
38. H. Moller., M.F. Sorensen., Jensen C.B, and HammershOi. 1996. Binaural technique: do we need individual recordings?. In J. Audio Eng. Soc. 44, 451e469.
39. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proc. of ICML.
40. Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning Structured Output Representation using Deep Conditional Generative Models. In Advances in Neural Information Processing Systems.
41. Ryusuke Takahama, Toshihiro Kamishima, and Hisashi Kashima. 2016. Progressive Comparison for Ranking Estimation. In Proc. of IJCAI.
42. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proc. of IEEE ICCV.
43. Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Ko-ray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. In CoRR abs/1609.03499.
44. Z. Wang and C. F. Chan. 2013. HRIR customization using common factor decomposition and joint support vector regression. In Eur. Signal Process. Conf.
45. E.M Wenzel, D. J Arruda, and D.J Kistler. 1993. Localization using non-individualized head-related transfer functions. In J. Acoust. Soc. Am. 94. Cross Ref
46. E.M Wenzel and S.H Foster. 1993. Perceptual consequences of interpolating head-related transfer functions during spatial synthesis. In Proc. of Workshop on Applications of Signal Processing to Audio and Acoustics. Cross Ref
47. T. Xiao and Q. H. Liu. 2003. Finite difference computation of head-related transfer function for human hearing. In J. Acoust Soc. Am.
48. M. E Yumer, P Asente, R Mech, and L. B Kara. 2015. Procedural Modeling Using Autoencoder Networks. In Proc. of ACM UIST. ACM.
49. D. N. Zotkin, R. Duraiswami, and L. S. Davis. 2004. Rendering localized spatial audio in a virtual auditory space. In IEEE Transactions on Multimedia, vol. 6(4).
50. Dmitry N. Zotkin, Ramani Duraiswami, Elena Grassi, and Nail A. Gumerov. 2006. Fast head-related transfer function measurement via reciprocity. In J. Acoust Soc. Am, Vol. 120. Cross Ref


