Simplifying Facial Animation using Deep Learning based Phoneme Recognition

Vladimir Ivanov; Parag Havaldar

“Simplifying Facial Animation using Deep Learning based Phoneme Recognition” by Ivanov and Havaldar

Next: “Simulated photographic development of synthetic... »

« Previous: “Simplifying complex environments using...

Conference:

SIGGRAPH 2023

Title:

Simplifying Facial Animation using Deep Learning based Phoneme Recognition

Session/Category Title:

Big Rigs: Advances in Rigging

Presenter(s)/Author(s):

Vladimir Ivanov

Parag Havaldar

Interest Area:

Production & Animation

Abstract:

We present a machine learning framework for facial animation that is simple, easy to implement and integrates well into an artist friendly workflow. The framework employs a pre-trained deep learning model used for phoneme extraction. Each phoneme is mapped to artist defined face shapes resulting in sparse key frames generating quick and synchronized looking dialogue animations, which can optionally be enhanced using familiar artist friendly workflows.

References:

[1] Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Trans. Graph. 36, 4 (2017), 94:1–94:12. https://doi.org/10.1145/3072959.3073658

[2] Iñaki Navarro, Dario Kneubuehler, Tijmen Verhulsdonck, Eloi Du Du Bois, Will Welch, Vivek Verma, Ian Sachs, and Kiran Bhat. 2021. Fast Facial Animation from Video. In SIGGRAPH 2021: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Talks, Virtual Event, USA, August 9-13, 2021. ACM, 25:1–25:2. https://doi.org/10.1145/3450623.3464681

[3] Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2020. First Order Motion Model for Image Animation. CoRR abs/2003.00196 (2020). arXiv:2003.00196https://arxiv.org/abs/2003.00196

[4] Qiantong Xu, Alexei Baevski, and Michael Auli. 2022. Simple and Effective Zero-shot Cross-lingual Phoneme Recognition. In Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, Korea, 18-22 September 2022, Hanseok Ko and John H. L. Hansen (Eds.). ISCA, 2113–2117. https://doi.org/10.21437/Interspeech.2022-60

Additional Images:

: 2023 Talks: Ivanov_Simplifying Facial Animation using Deep Learning based Phoneme Recognition

ACM Digital Library Publication: