WongLaw
fecde70371
|
2 years ago | |
---|---|---|
.. | ||
README.md | 2 years ago | |
README_cn.md | 2 years ago | |
path.sh | 3 years ago | |
run.sh | 3 years ago | |
sentences.txt | 3 years ago | |
style_syn.py | 3 years ago |
README.md
(简体中文|English)
Style FastSpeech2
Introduction
FastSpeech2 is a classical acoustic model for Text-to-Speech synthesis, which introduces controllable speech input, including phoneme duration
、 energy
and pitch
.
In the prediction phase, you can change these controllable variables to get some interesting results.
For example:
-
The
duration
control inFastSpeech2
can control the speed of audios will keep thepitch
. (in some speech tools, increasing the speed will increase the pitch and vice versa.) -
When we set the
pitch
of one sentence to a mean value and set thetones
of phones to1
, we will get arobot-style
timbre. -
When we raise the
pitch
of an adult female (with a fixed scale ratio), we will get achild-style
timbre.
The duration
and pitch
of different phonemes in a sentence can have different scale ratios. You can set different scale ratios to emphasize or weaken the pronunciation of some phonemes.
Usage
Run the following command line to get started:
./run.sh
In run.sh
, it will execute source path.sh
firstly, which will set the environment variants.
If you would like to try your sentence, please replace the sentence in sentences.txt
.
For more details, please see style_syn.py
The audio samples are in style-control-in-fastspeech2