|
|
|
@ -216,7 +216,7 @@ optional arguments:
|
|
|
|
|
|
|
|
|
|
## Pretrained Model
|
|
|
|
|
Pretrained FastSpeech2 model with no silence in the edge of audios:
|
|
|
|
|
- [fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)
|
|
|
|
|
- [fastspeech2_vctk_ckpt_1.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_ckpt_1.2.0.zip)
|
|
|
|
|
|
|
|
|
|
The static model can be downloaded here:
|
|
|
|
|
- [fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip)
|
|
|
|
@ -226,9 +226,11 @@ The ONNX model can be downloaded here:
|
|
|
|
|
|
|
|
|
|
FastSpeech2 checkpoint contains files listed below.
|
|
|
|
|
```text
|
|
|
|
|
fastspeech2_nosil_vctk_ckpt_0.5
|
|
|
|
|
fastspeech2_vctk_ckpt_1.2.0
|
|
|
|
|
├── default.yaml # default config used to train fastspeech2
|
|
|
|
|
├── energy_stats.npy # statistics used to normalize energy when training fastspeech2
|
|
|
|
|
├── phone_id_map.txt # phone vocabulary file when training fastspeech2
|
|
|
|
|
├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2
|
|
|
|
|
├── snapshot_iter_66200.pdz # model parameters and optimizer states
|
|
|
|
|
├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2
|
|
|
|
|
└── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2
|
|
|
|
@ -241,9 +243,9 @@ FLAGS_allocator_strategy=naive_best_fit \
|
|
|
|
|
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
|
|
|
|
|
python3 ${BIN_DIR}/../synthesize_e2e.py \
|
|
|
|
|
--am=fastspeech2_vctk \
|
|
|
|
|
--am_config=fastspeech2_nosil_vctk_ckpt_0.5/default.yaml \
|
|
|
|
|
--am_ckpt=fastspeech2_nosil_vctk_ckpt_0.5/snapshot_iter_66200.pdz \
|
|
|
|
|
--am_stat=fastspeech2_nosil_vctk_ckpt_0.5/speech_stats.npy \
|
|
|
|
|
--am_config=fastspeech2_vctk_ckpt_1.2.0/default.yaml \
|
|
|
|
|
--am_ckpt=fastspeech2_vctk_ckpt_1.2.0/snapshot_iter_66200.pdz \
|
|
|
|
|
--am_stat=fastspeech2_vctk_ckpt_1.2.0/speech_stats.npy \
|
|
|
|
|
--voc=pwgan_vctk \
|
|
|
|
|
--voc_config=pwg_vctk_ckpt_0.1.1/default.yaml \
|
|
|
|
|
--voc_ckpt=pwg_vctk_ckpt_0.1.1/snapshot_iter_1500000.pdz \
|
|
|
|
@ -251,8 +253,8 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \
|
|
|
|
|
--lang=en \
|
|
|
|
|
--text=${BIN_DIR}/../sentences_en.txt \
|
|
|
|
|
--output_dir=exp/default/test_e2e \
|
|
|
|
|
--phones_dict=fastspeech2_nosil_vctk_ckpt_0.5/phone_id_map.txt \
|
|
|
|
|
--speaker_dict=fastspeech2_nosil_vctk_ckpt_0.5/speaker_id_map.txt \
|
|
|
|
|
--phones_dict=fastspeech2_vctk_ckpt_1.2.0/phone_id_map.txt \
|
|
|
|
|
--speaker_dict=fastspeech2_vctk_ckpt_1.2.0/speaker_id_map.txt \
|
|
|
|
|
--spk_id=0 \
|
|
|
|
|
--inference_dir=exp/default/inference
|
|
|
|
|
```
|
|
|
|
|