Speech driven head motion synthesis based on a trajectory model

Gregor Hofer; Hiroshi Shimodaira; Junichi Yamagishi

“Speech driven head motion synthesis based on a trajectory model” by Hofer, Shimodaira and Yamagishi

Next: “Speech to talking heads system based on hidden... »

« Previous: “Speech and Audio in Window Systems: When Will...

Conference:

SIGGRAPH 2007

Type(s):

Posters

Title:

Speech driven head motion synthesis based on a trajectory model

Presenter(s)/Author(s):

Gregor Hofer

Hiroshi Shimodaira

Junichi Yamagishi

Abstract:

Making human-like characters more natural and life-like requires more inventive approaches than current standard techniques such as synthesis using text features or triggers. In this poster we present a novel approach to automatically synthesise head motion based on speech features. Previous work has focused on frame wise modelling of motion [Busso et al. 2007] or has treated the speach data and motion data streams separately [Brand 1999], although the trajectories of the head motion and speech features are highly correlated and dynamically change over several frames. To model longer units of motion and speech and to reproduce their trajectories during synthesis, we utilise a promising time series stochastic model called “Trajectory Hidden Markov Models” [Zen et al. 2007]. Its parameter generation algorithm can produce motion trajectories from sequences of units of motion and speech. These two kinds of data are simultaneously modelled by using a multi-stream type of the trajectory HMMs. The models can be viewed as a Kalman-smoother-like approach, and thereby are capable of producing smooth trajectories.

References:

1. Brand, M. 1999. Voice puppetry. In Proc. of SIGGRAPH ’99.
2. Busso, C., Deng, Z., Grimm, M., Neumann, U., and Narayanan, S. 2007. Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis. IEEE Transactions on ASLP.
3. Zen, H., Tokuda, K., and Kitamura, T. 2007. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Computer Speech and Language 21, 1.

ACM Digital Library Publication: