|
|
4 years ago | |
|---|---|---|
| .. | ||
| README.md | 4 years ago | |
| path.sh | 4 years ago | |
| run.sh | 4 years ago | |
| sentences.txt | 4 years ago | |
| style_syn.py | 4 years ago | |
README.md
Style FastSpeech2
Introduction
FastSpeech2 is a classical acoustic model for Text-to-Speech synthesis, which introduces controllable speech input, including phoneme duration、energy and pitch.
In the prediction phase, you can change these controllable variables to get some interesting results.
For example:
-
The
durationcontrol inFastSpeech2can control the speed of audios will keep thepitch. (in some speech tool, increase the speed will increase the pitch, and vice versa.) -
When we set
pitchof one sentence to a mean value and settonesof phones to1, we will get arobot-styletimbre. -
When we raise the
pitchof an adult female (with a fixed scale ratio), we will get achild-styletimbre.
The duration and pitch of different phonemes in a sentence can have different scale ratios. You can set different scale ratios to emphasize or weaken the pronunciation of some phonemes.
Usage
Run the following command line to get started:
./run.sh
For more details, please see style_syn.py
The audio samples are in style-control-in-fastspeech2