Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models

Szczegóły
Abstrakt

Tytuł:: Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models
Autorzy:: Hirose, K.
Sun, Q.
Minematsu, N.
Data publikacji:: 2007
Słowa kluczowe:: speech synthesis
F0 contour generation
Standard Chinese
superpositional model
tone nucleus model
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

Przejdź do źródła

A method for generating sentence F0 contours of Standard Chinese speech is developed. It is based on superposing tone components on phrase components in logarithmic frequency. While tone components are language specific, phrase components are assumed to be more language universal. Taking this situation into account, the method treats two kinds of components differently. The tone components are generated by concatenating F0 patterns of tone nuclei, which are predicted by a corpus-based scheme, while the phrase components are generated by rules. Experiments on F0 contour generation were conducted using 100 news utterances by a female speaker. First experiments were conducted on the generation of tone components, with phrase components of the original utterances being used unchanged. The results showed that the method could generate F0 contours close to those of target speech. Speech synthesis was conducted by substituting original F0 contours to generated ones by TD-PSOLA. A high score 4.5 in 5-point scale was obtained on average as the result of listening experiments on the quality of synthetic speech. Second experiments were on the generated phrase components, with the tone components extracted from the original utterances. Although the synthetic speech with generated F0 contours sounded mostly natural, there were occasional "degraded sounds", because of mismatch between the phrase and the tone components. To cope with the mismatch, a two-step method was developed, where information of the phrase contours was used for the prediction of tone components. Validity on the method was shown through perceptual experiments on synthesized speech.

Informacja

Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models