History

TianYuan 979f75e483 [doc]updade readme for mix tts (#2284 ) * format g2pw * fix tone sand_hi bugs for Chinese frontend * fix stats bugs * fix point bug, test=tts * fix point bug, test=tts * update readme for mix tts, test=doc Co-authored-by: liangym <34430015+lym0302@users.noreply.github.com> Co-authored-by: lym0302 <lym0302@foxmail.com>		2 years ago
..
README.md	[doc]updade readme for mix tts (#2284 )	2 years ago
README_cn.md	[doc]updade readme for mix tts (#2284 )	2 years ago
run.sh	more cli for speech demos	3 years ago

README.md

Unescape Escape

(简体中文|English)

TTS (Text To Speech)

Introduction

Text-to-speech (TTS) is a natural language modeling process that requires changing units of text into units of speech for audio presentation.

This demo is an implementation to generate audio from the given text. It can be done by a single command or a few lines in python using PaddleSpeech.

Usage

1. Installation

see installation.

You can choose one way from easy, meduim and hard to install paddlespeech.

2. Prepare Input

The input of this demo should be a text of the specific language that can be passed via argument.

3. Usage

Command Line (Recommended)

Chinese The default acoustic model is Fastspeech2, and the default vocoder is Parallel WaveGAN.
```
paddlespeech tts --input "你好，欢迎使用百度飞桨深度学习框架！"
```

Batch Process

echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts

Chinese, use SpeedySpeech as the acoustic model

paddlespeech tts --am speedyspeech_csmsc --input "你好，欢迎使用百度飞桨深度学习框架！"

Chinese, multi-speaker

You can change spk_id here.

paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好，欢迎使用百度飞桨深度学习框架！" --spk_id 0

English

paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "hello world"

English, multi-speaker

You can change spk_id here.

paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "hello, boys" --lang en --spk_id 0

Chinese English Mixed, multi-speaker You can change spk_id here.

# The `am` must be `fastspeech2_mix`!
# The `lang` must be `mix`!
# The voc must be chinese datasets' voc now!
# spk 174 is csmcc, spk 175 is ljspeech
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "热烈欢迎您在 Discussions 中提交问题，并在 Issues 中指出发现的 bug。此外，我们非常希望您参与到 Paddle Speech 的开发中！" --spk_id 174 --output mix_spk174.wav
paddlespeech tts --am fastspeech2_mix --voc hifigan_aishell3 --lang mix --input "热烈欢迎您在 Discussions 中提交问题，并在 Issues 中指出发现的 bug。此外，我们非常希望您参与到 Paddle Speech 的开发中！" --spk_id 174 --output mix_spk174_aishell3.wav
paddlespeech tts --am fastspeech2_mix --voc pwgan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175_pwgan.wav
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175.wav

Usage:

paddlespeech tts --help

Arguments:

input(required): Input text to generate..
am: Acoustic model type of tts task. Default: fastspeech2_csmsc.
am_config: Config of acoustic model. Use deault config when it is None. Default: None.
am_ckpt: Acoustic model checkpoint. Use pretrained model when it is None. Default: None.
am_stat: Mean and standard deviation used to normalize spectrogram when training acoustic model. Default: None.
phones_dict: Phone vocabulary file. Default: None.
tones_dict: Tone vocabulary file. Default: None.
speaker_dict: speaker id map file. Default: None.
spk_id: Speaker id for multi speaker acoustic model. Default: 0.
voc: Vocoder type of tts task. Default: pwgan_csmsc.
voc_config: Config of vocoder. Use deault config when it is None. Default: None.
voc_ckpt: Vocoder checkpoint. Use pretrained model when it is None. Default: None.
voc_stat: Mean and standard deviation used to normalize spectrogram when training vocoder. Default: None.
lang: Language of tts task. Default: zh.
device: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
output: Output wave filepath. Default: output.wav.

Output:

[2021-12-09 20:49:58,955] [    INFO] [log.py] [L57] - Wave file has been generated: output.wav

Python API

import paddle
from paddlespeech.cli.tts import TTSExecutor

tts_executor = TTSExecutor()
wav_file = tts_executor(
    text='今天的天气不错啊',
    output='output.wav',
    am='fastspeech2_csmsc',
    am_config=None,
    am_ckpt=None,
    am_stat=None,
    spk_id=0,
    phones_dict=None,
    tones_dict=None,
    speaker_dict=None,
    voc='pwgan_csmsc',
    voc_config=None,
    voc_ckpt=None,
    voc_stat=None,
    lang='zh',
    device=paddle.get_device())
print('Wave file has been generated: {}'.format(wav_file))

Output:

Wave file has been generated: output.wav

4. Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

Acoustic model

Model	Language
speedyspeech_csmsc	zh
fastspeech2_csmsc	zh
fastspeech2_ljspeech	en
fastspeech2_aishell3	zh
fastspeech2_vctk	en
fastspeech2_cnndecoder_csmsc	zh
fastspeech2_mix	mix
tacotron2_csmsc	zh
tacotron2_ljspeech	en

Vocoder

Model	Language
pwgan_csmsc	zh
pwgan_ljspeech	en
pwgan_aishell3	zh
pwgan_vctk	en
mb_melgan_csmsc	zh
style_melgan_csmsc	zh
hifigan_csmsc	zh
hifigan_ljspeech	en
hifigan_aishell3	zh
hifigan_vctk	en
wavernn_csmsc	zh

README.md Unescape Escape