“EDAVS: Emotion-Driven Audiovisual Synthesis Experience” by Chen, Chen and Gu
Conference:
Type(s):
Title:
- EDAVS: Emotion-Driven Audiovisual Synthesis Experience
Session/Category Title: Art & Design
Presenter(s)/Author(s):
Abstract:
In this paper, we proposed a novel approach to the creation of immersive multimedia content. At its core, EDAVS harnesses the subtle inflections of human speech, translating emotive cues into dynamic visual narratives and corresponding soundscapes, using cutting-edge machine-learning algorithms capable of deep semantic and emotional analysis.
References:
[1]
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: a framework for self-supervised learning of speech representations. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS ?20). Curran Associates Inc., Red Hook, NY, USA, Article 1044, 12 pages.
[2]
Enrique Hern?ndez Calabr?s. 2024. wav2vec2-lg-xlsr-en-speech-emotion-recognition (Revision 17cf17c). https://doi.org/10.57967/hf/2045
[3]
Shashidhar G. Koolagudi and K. Sreenivasa Rao. 2012. Emotion recognition from speech: a review. International Journal of Speech Technology 15, 2 (2012), 99?117. https://doi.org/10.1007/s10772-011-9125-1
[4]
Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yuping Wang, Wenwu Wang, Yuxuan Wang, and Mark D. Plumbley. 2024. AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2024), 2871?2883. https://doi.org/10.1109/TASLP.2024.3399607
[5]
Rosalind W. Picard. 2000. Affective computing. MIT Press, Cambridge, Mass.
[6]
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion 37 (2017), 98?125. https://doi.org/10.1016/j.inffus.2017.02.003
[7]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML?23). JMLR.org, Article 1182, 27 pages.