1.2 KiB

Raw Blame History

Style FastSpeech2

Introduction

FastSpeech2 is a classical acoustic model for Text-to-Speech synthesis, which introduces controllable speech input, including phoneme duration、energy and pitch.

In the prediction phase, you can change these controllable variables to get some interesting results.

For example:

The duration control in FastSpeech2 can control the speed of audios will keep the pitch. (in some speech tool, increase the speed will increase the pitch, and vice versa.)
When we set pitch of one sentence to a mean value and set tones of phones to 1, we will get a robot-style timbre.
When we raise the pitch of an adult female (with a fixed scale ratio), we will get a child-style timbre.

The duration and pitch of different phonemes in a sentence can have different scale ratios. You can set different scale ratios to emphasize or weaken the pronunciation of some phonemes.

Usage

Run the following command line to get started:

./run.sh

For more details, please see style_syn.py

The audio samples are in style-control-in-fastspeech2

1.2 KiB Raw Blame History

Style FastSpeech2

Introduction

Usage

1.2 KiB

Raw Blame History