“Constraint-based Synthesis of Visual Speech”

  • ©




    Constraint-based Synthesis of Visual Speech


    This sketch concerns the animation of facial movement during speech production. In this work we consider speech gestures as trajectories through a space containing all visible vocal tract postures. Within this visible speech space, visual-phonemes (or visemes) are defined as collections of vocal tract postures which produce simi- lar speech sounds (i.e. an individual phoneme in audible speech). This definition is distinct from many techniques in which the terms viseme and morph-target could be used interchangably (e.g. [Cohen and Massaro 1993]). A speech trajectory will always interpolate the visemes corresponding to its phonetic structure (i.e. there is a direct mapping from audio → visual speech). However, as visemes are not individual targets we must determine how the trajectory passes through each of the visemes according to both physical constraints and context; this is the notion of coarticulation [Lofqvist 1990].


    Cohen, M., and Massaro, D. 1993. Modeling coarticulation in synthetic visual speech. Computer Animation ’93, 131–156.
    Fung, Y. C. 1993. Biomechanics – Mechanical Properties of Living Tissues, second ed. Springer-Verlag.
    Lee, Y., Terzopoulos, D., and Waters, K. 1995. Realistic modeling for facial animation. Computer Graphics 29, Annual Conference Series, 55–62.

ACM Digital Library Publication:

Overview Page: