History

TianYuan d3aa7c6168 Update README.md		3 years ago
..
README.md	Update README.md	3 years ago
path.sh	…
run.sh	…
sentences.txt	…
style_syn.py	…

README.md

Style FastSpeech2

Introduction

FastSpeech2 is a classical acoustic model for Text-to-Speech synthesis, which introduces controllable speech input, including phoneme duration、energy and pitch.

In the prediction phase, you can change these controllable variables to get some interesting results.

For example:

The duration control in FastSpeech2 can control the speed of audios will keep the pitch. (in some speech tool, increase the speed will increase the pitch, and vice versa.)
When we set pitch of one sentence to a mean value and set tones of phones to 1, we will get a robot-style timbre.
When we raise the pitch of an adult female (with a fixed scale ratio), we will get a child-style timbre.

The duration and pitch of different phonemes in a sentence can have different scale ratios. You can set different scale ratios to emphasize or weaken the pronunciation of some phonemes.

Usage

Run the following command line to get started:

./run.sh

In run.sh, it will execute source path.sh firstly, which will set the environment variants.

If you would like to try your own sentence, please replace the sentence in sentences.txt.

For more details, please see style_syn.py

The audio samples are in style-control-in-fastspeech2