[TTS]Cantonese TTS checkpoint for e2e, test=tts (#2932)

pull/2937/head
HuangLiangJie 3 years ago committed by GitHub
parent 1af9bd47d9
commit 2553022396
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -75,3 +75,41 @@ Also, there is a `metadata.jsonl` in each subfolder. It is a table-like file tha
### Training details can refer to the script of [examples/aishell3/tts3](../../aishell3/tts3).
## Pretrained Model
Pretrained FastSpeech2 model with no silence in the edge of audios:
- [fastspeech2_canton_ckpt_1.4.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_canton_ckpt_1.4.0.zip)
FastSpeech2 checkpoint contains files listed below.
```text
fastspeech2_canton_ckpt_1.4.0
├── default.yaml # default config used to train fastspeech2
├── energy_stats.npy # statistics used to normalize energy when training fastspeech2
├── phone_id_map.txt # phone vocabulary file when training fastspeech2
├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2
├── snapshot_iter_140000.pdz # model parameters and optimizer states
├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2
└── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2
```
You can use the following scripts to synthesize for `${BIN_DIR}/../sentences_canton.txt` using pretrained fastspeech2 and parallel wavegan models.
```bash
source path.sh
FLAGS_allocator_strategy=naive_best_fit \
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
python3 ${BIN_DIR}/../synthesize_e2e.py \
--am=fastspeech2_aishell3 \
--am_config=fastspeech2_canton_ckpt_1.4.0/default.yaml \
--am_ckpt=fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz \
--am_stat=fastspeech2_canton_ckpt_1.4.0/speech_stats.npy \
--voc=pwgan_aishell3 \
--voc_config=pwg_aishell3_ckpt_0.5/default.yaml \
--voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
--voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \
--lang=canton \
--text=${BIN_DIR}/../sentences_canton.txt \
--output_dir=exp/default/test_e2e \
--phones_dict=fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt \
--speaker_dict=fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt \
--spk_id=0 \
--inference_dir=exp/default/inference
```

@ -3,14 +3,14 @@
set -e
source path.sh
gpus=0
gpus=0,1
stage=0
stop_stage=100
conf_path=conf/default.yaml
train_output_path=exp/default
ckpt_name=snapshot_iter_280000.pdz
ckpt_name=snapshot_iter_140000.pdz
# with the following command, you can choose the stage range you want to run
# such as `./run.sh --stage 0 --stop-stage 0`

Loading…
Cancel
Save