diff --git a/examples/canton/tts3/README.md b/examples/canton/tts3/README.md index 3bf4fd8ee..b8d86a071 100644 --- a/examples/canton/tts3/README.md +++ b/examples/canton/tts3/README.md @@ -75,3 +75,41 @@ Also, there is a `metadata.jsonl` in each subfolder. It is a table-like file tha ### Training details can refer to the script of [examples/aishell3/tts3](../../aishell3/tts3). ## Pretrained Model +Pretrained FastSpeech2 model with no silence in the edge of audios: +- [fastspeech2_canton_ckpt_1.4.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_canton_ckpt_1.4.0.zip) + +FastSpeech2 checkpoint contains files listed below. + +```text +fastspeech2_canton_ckpt_1.4.0 +├── default.yaml # default config used to train fastspeech2 +├── energy_stats.npy # statistics used to normalize energy when training fastspeech2 +├── phone_id_map.txt # phone vocabulary file when training fastspeech2 +├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2 +├── snapshot_iter_140000.pdz # model parameters and optimizer states +├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2 +└── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2 +``` +You can use the following scripts to synthesize for `${BIN_DIR}/../sentences_canton.txt` using pretrained fastspeech2 and parallel wavegan models. +```bash +source path.sh + +FLAGS_allocator_strategy=naive_best_fit \ +FLAGS_fraction_of_gpu_memory_to_use=0.01 \ +python3 ${BIN_DIR}/../synthesize_e2e.py \ + --am=fastspeech2_aishell3 \ + --am_config=fastspeech2_canton_ckpt_1.4.0/default.yaml \ + --am_ckpt=fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz \ + --am_stat=fastspeech2_canton_ckpt_1.4.0/speech_stats.npy \ + --voc=pwgan_aishell3 \ + --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \ + --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ + --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy \ + --lang=canton \ + --text=${BIN_DIR}/../sentences_canton.txt \ + --output_dir=exp/default/test_e2e \ + --phones_dict=fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt \ + --speaker_dict=fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt \ + --spk_id=0 \ + --inference_dir=exp/default/inference +``` diff --git a/examples/canton/tts3/run.sh b/examples/canton/tts3/run.sh index 0a34e5238..e84323134 100755 --- a/examples/canton/tts3/run.sh +++ b/examples/canton/tts3/run.sh @@ -3,14 +3,14 @@ set -e source path.sh -gpus=0 +gpus=0,1 stage=0 stop_stage=100 conf_path=conf/default.yaml train_output_path=exp/default -ckpt_name=snapshot_iter_280000.pdz +ckpt_name=snapshot_iter_140000.pdz # with the following command, you can choose the stage range you want to run # such as `./run.sh --stage 0 --stop-stage 0`