“Interactive speech conversion system cloning speaker intonation automatically” by Adachi and Morishima

    In this paper, we present an interactive speech conversion system that can modify emotinal factors, emphasize utterances, and append a dialect. This system converts the prosody of stored sample voice into a new prosody given by realtime analysis of microphone captured reference voice. Only intonations are controlled by converting utterance speed, pitch, and power of speech with keeping a characteristic of the speaker’s and linguistic information as an original. Especially, an interpolation method of utterance speed is originally proposed and a quality of synthesized speech is as good as original after subjective evaluation.

