|
|
1 year ago | |
|---|---|---|
| .. | ||
| README.md | 3 years ago | |
| README_cn.md | 3 years ago | |
| path.sh | 4 years ago | |
| run.sh | 1 year ago | |
| sentences.txt | 4 years ago | |
| style_syn.py | 4 years ago | |
README.md
(简体中文|English)
Style FastSpeech2
Introduction
FastSpeech2 is a classical acoustic model for Text-to-Speech synthesis, which introduces controllable speech input, including phoneme duration、 energy and pitch.
In the prediction phase, you can change these controllable variables to get some interesting results.
For example:
-
The
durationcontrol inFastSpeech2can control the speed of audios will keep thepitch. (in some speech tools, increasing the speed will increase the pitch and vice versa.) -
When we set the
pitchof one sentence to a mean value and set thetonesof phones to1, we will get arobot-styletimbre. -
When we raise the
pitchof an adult female (with a fixed scale ratio), we will get achild-styletimbre.
The duration and pitch of different phonemes in a sentence can have different scale ratios. You can set different scale ratios to emphasize or weaken the pronunciation of some phonemes.
Usage
Run the following command line to get started:
./run.sh
In run.sh, it will execute source path.sh firstly, which will set the environment variants.
If you would like to try your sentence, please replace the sentence in sentences.txt.
For more details, please see style_syn.py
The audio samples are in style-control-in-fastspeech2