From 1c2a6b8e30dd69c36ee59851db70febcddcf99c2 Mon Sep 17 00:00:00 2001 From: liangym <34430015+lym0302@users.noreply.github.com> Date: Fri, 26 Aug 2022 15:23:53 +0800 Subject: [PATCH 1/4] updata readme, test=doc (#2313) --- examples/aishell3/tts3/README.md | 2 +- examples/zh_en_tts/tts3/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/aishell3/tts3/README.md b/examples/aishell3/tts3/README.md index 21bad51ec..6ef2870c2 100644 --- a/examples/aishell3/tts3/README.md +++ b/examples/aishell3/tts3/README.md @@ -217,7 +217,7 @@ optional arguments: ## Pretrained Model Pretrained FastSpeech2 model with no silence in the edge of audios: -- [fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip) +- [fastspeech2_aishell3_ckpt_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip) - [fastspeech2_conformer_aishell3_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_aishell3_ckpt_0.2.0.zip) (Thanks for [@awmmmm](https://github.com/awmmmm)'s contribution) The static model can be downloaded here: diff --git a/examples/zh_en_tts/tts3/README.md b/examples/zh_en_tts/tts3/README.md index 131d7f2c4..e7365baa2 100644 --- a/examples/zh_en_tts/tts3/README.md +++ b/examples/zh_en_tts/tts3/README.md @@ -251,7 +251,7 @@ optional arguments: ## Pretrained Model Pretrained FastSpeech2 model with no silence in the edge of audios: -- [fastspeech2_mix_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_ckpt_0.2.0.zip) +- [fastspeech2_mix_ckpt_1.2.0.zip](https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_ckpt_1.2.0.zip) The static model can be downloaded here: - [fastspeech2_mix_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_static_0.2.0.zip) From 984886fb8c3f8b0e0a75b4423c544926cda91bf1 Mon Sep 17 00:00:00 2001 From: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Date: Fri, 26 Aug 2022 15:30:03 +0800 Subject: [PATCH 2/4] add barrier (#2311) --- tests/test_tipc/benchmark_train.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test_tipc/benchmark_train.sh b/tests/test_tipc/benchmark_train.sh index 4b7677c72..7f0382ac5 100644 --- a/tests/test_tipc/benchmark_train.sh +++ b/tests/test_tipc/benchmark_train.sh @@ -154,6 +154,7 @@ else device_num_list=($device_num) fi +PYTHON="${python}" bash test_tipc/barrier.sh IFS="|" for batch_size in ${batch_size_list[*]}; do for precision in ${fp_items_list[*]}; do From c7163abffa643342a294d24883416e788dfbf3af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?David=20An=20=EF=BC=88An=20Hongliang=EF=BC=89?= Date: Fri, 26 Aug 2022 15:43:13 +0800 Subject: [PATCH 3/4] add thanks into readme, append data for chinese unit (#2312) * add chinese words correct phonic,test=tts * added thanks into readme. add data of unit, test=tts * added thanks into readme. add data of unit, test=tts * modify data of unit, test=tts * modify thanks, test=tts --- README.md | 3 ++- README_cn.md | 3 ++- paddlespeech/t2s/frontend/zh_normalization/num.py | 2 +- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 122704d2d..7f10fc02e 100644 --- a/README.md +++ b/README.md @@ -793,6 +793,7 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P ### Contributors

+ @@ -829,7 +830,7 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P

## Acknowledgement - +- Many thanks to [david-95](https://github.com/david-95) improved TTS, fixed multi-punctuation bug, and contributed to multiple program and data. - Many thanks to [BarryKCL](https://github.com/BarryKCL) improved TTS Chinses frontend based on [G2PW](https://github.com/GitYCC/g2pW) - Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help. - Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files. diff --git a/README_cn.md b/README_cn.md index ca42e71f6..b4bd53f36 100644 --- a/README_cn.md +++ b/README_cn.md @@ -797,6 +797,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 ### 贡献者

+ @@ -833,7 +834,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声

## 致谢 - +- 非常感谢 [david-95](https://github.com/david-95)修复句尾多标点符号出错的问题,补充frontend语音polyphonic 数据,贡献补充多条程序和数据 - 非常感谢 [BarryKCL](https://github.com/BarryKCL)基于[G2PW](https://github.com/GitYCC/g2pW)对TTS中文文本前端的优化。 - 非常感谢 [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) 多年来的关注和建议,以及在诸多问题上的帮助。 - 非常感谢 [mymagicpower](https://github.com/mymagicpower) 采用PaddleSpeech 对 ASR 的[短语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk)及[长语音](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk)进行 Java 实现。 diff --git a/paddlespeech/t2s/frontend/zh_normalization/num.py b/paddlespeech/t2s/frontend/zh_normalization/num.py index 0002ed504..8a54d3e63 100644 --- a/paddlespeech/t2s/frontend/zh_normalization/num.py +++ b/paddlespeech/t2s/frontend/zh_normalization/num.py @@ -28,7 +28,7 @@ UNITS = OrderedDict({ 8: '亿', }) -COM_QUANTIFIERS = '(人|所|朵|匹|张|座|回|场|尾|条|个|首|阙|阵|网|炮|顶|丘|棵|只|支|袭|辆|挑|担|颗|壳|窠|曲|墙|群|腔|砣|座|客|贯|扎|捆|刀|令|打|手|罗|坡|山|岭|江|溪|钟|队|单|双|对|出|口|头|脚|板|跳|枝|件|贴|针|线|管|名|位|身|堂|课|本|页|家|户|层|丝|毫|厘|分|钱|两|斤|担|铢|石|钧|锱|忽|(千|毫|微)克|毫|厘|(公)分|分|寸|尺|丈|里|寻|常|铺|程|(千|分|厘|毫|微)米|米|撮|勺|合|升|斗|石|盘|碗|碟|叠|桶|笼|盆|盒|杯|钟|斛|锅|簋|篮|盘|桶|罐|瓶|壶|卮|盏|箩|箱|煲|啖|袋|钵|年|月|日|季|刻|时|周|天|秒|分|小时|旬|纪|岁|世|更|夜|春|夏|秋|冬|代|伏|辈|丸|泡|粒|颗|幢|堆|条|根|支|道|面|片|张|颗|块|元|(亿|千万|百万|万|千|百)|(亿|千万|百万|万|千|百|美|)元|(亿|千万|百万|万|千|百|)块|角|毛|分)' +COM_QUANTIFIERS = '(封|艘|把|目|套|段|人|所|朵|匹|张|座|回|场|尾|条|个|首|阙|阵|网|炮|顶|丘|棵|只|支|袭|辆|挑|担|颗|壳|窠|曲|墙|群|腔|砣|座|客|贯|扎|捆|刀|令|打|手|罗|坡|山|岭|江|溪|钟|队|单|双|对|出|口|头|脚|板|跳|枝|件|贴|针|线|管|名|位|身|堂|课|本|页|家|户|层|丝|毫|厘|分|钱|两|斤|担|铢|石|钧|锱|忽|(千|毫|微)克|毫|厘|(公)分|分|寸|尺|丈|里|寻|常|铺|程|(千|分|厘|毫|微)米|米|撮|勺|合|升|斗|石|盘|碗|碟|叠|桶|笼|盆|盒|杯|钟|斛|锅|簋|篮|盘|桶|罐|瓶|壶|卮|盏|箩|箱|煲|啖|袋|钵|年|月|日|季|刻|时|周|天|秒|分|小时|旬|纪|岁|世|更|夜|春|夏|秋|冬|代|伏|辈|丸|泡|粒|颗|幢|堆|条|根|支|道|面|片|张|颗|块|元|(亿|千万|百万|万|千|百)|(亿|千万|百万|万|千|百|美|)元|(亿|千万|百万|万|千|百|十|)吨|(亿|千万|百万|万|千|百|)块|角|毛|分)' # 分数表达式 RE_FRAC = re.compile(r'(-?)(\d+)/(\d+)') From d21e03c03e4fb29cbd6ce3b708de19a6d542a04a Mon Sep 17 00:00:00 2001 From: TianYuan Date: Fri, 26 Aug 2022 18:06:12 +0800 Subject: [PATCH 4/4] update tts3 readme, test=doc (#2315) --- docs/source/released_model.md | 6 ++++-- examples/aishell3/tts3/README.md | 15 ++++++++------- examples/aishell3/tts3/local/synthesize_e2e.sh | 6 +++--- examples/other/g2p/README.md | 2 +- examples/vctk/tts3/README.md | 16 +++++++++------- examples/zh_en_tts/tts3/README.md | 14 ++++++++------ 6 files changed, 33 insertions(+), 26 deletions(-) diff --git a/docs/source/released_model.md b/docs/source/released_model.md index 8d0ff1d47..d6691812e 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -42,9 +42,11 @@ SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/Paddl FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_static_0.2.0.zip)
[fastspeech2_csmsc_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_csmsc_onnx_0.2.0.zip)|157MB| FastSpeech2-Conformer| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_conformer_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_conformer_baker_ckpt_0.5.zip)||| FastSpeech2-CNNDecoder| CSMSC| [fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)| [fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_ckpt_1.0.0.zip) | [fastspeech2_cnndecoder_csmsc_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_static_1.0.0.zip)
[fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_static_1.0.0.zip)
[fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_onnx_1.0.0.zip)
[fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_cnndecoder_csmsc_streaming_onnx_1.0.0.zip) | 84MB| -FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_ckpt_0.4.zip)|[fastspeech2_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip)
[fastspeech2_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip)|147MB| +FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_aishell3_ckpt_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip)|[fastspeech2_aishell3_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_static_1.1.0.zip)
[fastspeech2_aishell3_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_onnx_1.1.0.zip)|147MB| FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)|[fastspeech2_ljspeech_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_static_1.1.0.zip)
[fastspeech2_ljspeech_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_ljspeech_onnx_1.1.0.zip)|145MB| -FastSpeech2| VCTK |[fastspeech2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip)|[fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip)
[fastspeech2_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip) | 145MB| +FastSpeech2| VCTK |[fastspeech2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_vctk_ckpt_1.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_ckpt_1.2.0.zip)|[fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip)
[fastspeech2_vctk_onnx_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_onnx_1.1.0.zip) | 145MB| +FastSpeech2| ZH_EN |[fastspeech2-zh_en](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/zh_en_tts/tts3)|[fastspeech2_mix_ckpt_1.2.0.zip](https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_ckpt_1.2.0.zip)|[fastspeech2_mix_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_static_0.2.0.zip)
[fastspeech2_mix_onnx_0.2.0.zip](https://paddlespeech.bj.bcebos.com/t2s/chinse_english_mixed/models/fastspeech2_mix_onnx_0.2.0.zip) | 145MB| + ### Vocoders Model Type | Dataset| Example Link | Pretrained Models| Static/ONNX Models|Size (static) diff --git a/examples/aishell3/tts3/README.md b/examples/aishell3/tts3/README.md index 6ef2870c2..3e1dee2fb 100644 --- a/examples/aishell3/tts3/README.md +++ b/examples/aishell3/tts3/README.md @@ -229,9 +229,11 @@ The ONNX model can be downloaded here: FastSpeech2 checkpoint contains files listed below. ```text -fastspeech2_nosil_aishell3_ckpt_0.4 +fastspeech2_aishell3_ckpt_1.1.0 ├── default.yaml # default config used to train fastspeech2 +├── energy_stats.npy # statistics used to normalize energy when training fastspeech2 ├── phone_id_map.txt # phone vocabulary file when training fastspeech2 +├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2 ├── snapshot_iter_96400.pdz # model parameters and optimizer states ├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2 └── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2 @@ -244,9 +246,9 @@ FLAGS_allocator_strategy=naive_best_fit \ FLAGS_fraction_of_gpu_memory_to_use=0.01 \ python3 ${BIN_DIR}/../synthesize_e2e.py \ --am=fastspeech2_aishell3 \ - --am_config=fastspeech2_nosil_aishell3_ckpt_0.4/default.yaml \ - --am_ckpt=fastspeech2_nosil_aishell3_ckpt_0.4/snapshot_iter_96400.pdz \ - --am_stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \ + --am_config=fastspeech2_aishell3_ckpt_1.1.0/default.yaml \ + --am_ckpt=fastspeech2_aishell3_ckpt_1.1.0/snapshot_iter_96400.pdz \ + --am_stat=fastspeech2_aishell3_ckpt_1.1.0/speech_stats.npy \ --voc=pwgan_aishell3 \ --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \ --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ @@ -254,9 +256,8 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \ --lang=zh \ --text=${BIN_DIR}/../sentences.txt \ --output_dir=exp/default/test_e2e \ - --phones_dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \ - --speaker_dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt \ + --phones_dict=fastspeech2_aishell3_ckpt_1.1.0/phone_id_map.txt \ + --speaker_dict=fastspeech2_aishell3_ckpt_1.1.0/speaker_id_map.txt \ --spk_id=0 \ --inference_dir=exp/default/inference - ``` diff --git a/examples/aishell3/tts3/local/synthesize_e2e.sh b/examples/aishell3/tts3/local/synthesize_e2e.sh index ff3608be7..158350ae4 100755 --- a/examples/aishell3/tts3/local/synthesize_e2e.sh +++ b/examples/aishell3/tts3/local/synthesize_e2e.sh @@ -38,7 +38,7 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then --am=fastspeech2_aishell3 \ --am_config=${config_path} \ --am_ckpt=${train_output_path}/checkpoints/${ckpt_name} \ - --am_stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \ + --am_stat=dump/train/speech_stats.npy \ --voc=hifigan_aishell3 \ --voc_config=hifigan_aishell3_ckpt_0.2.0/default.yaml \ --voc_ckpt=hifigan_aishell3_ckpt_0.2.0/snapshot_iter_2500000.pdz \ @@ -46,8 +46,8 @@ if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then --lang=zh \ --text=${BIN_DIR}/../sentences.txt \ --output_dir=${train_output_path}/test_e2e \ - --phones_dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \ - --speaker_dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt \ + --phones_dict=dump/phone_id_map.txt \ + --speaker_dict=dump/speaker_id_map.txt \ --spk_id=0 \ --inference_dir=${train_output_path}/inference fi diff --git a/examples/other/g2p/README.md b/examples/other/g2p/README.md index a8f8f7340..882943504 100644 --- a/examples/other/g2p/README.md +++ b/examples/other/g2p/README.md @@ -12,7 +12,7 @@ Run the command below to get the results of the test. ./run.sh ``` -The `avg WER` of g2p is: 0.024219452438490413 +The `avg WER` of g2p is: 0.024169315564825305 ```text ,--------------------------------------------------------------------. diff --git a/examples/vctk/tts3/README.md b/examples/vctk/tts3/README.md index 9c0d75616..2a2f27fd4 100644 --- a/examples/vctk/tts3/README.md +++ b/examples/vctk/tts3/README.md @@ -216,7 +216,7 @@ optional arguments: ## Pretrained Model Pretrained FastSpeech2 model with no silence in the edge of audios: -- [fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_vctk_ckpt_0.5.zip) +- [fastspeech2_vctk_ckpt_1.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_ckpt_1.2.0.zip) The static model can be downloaded here: - [fastspeech2_vctk_static_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_vctk_static_1.1.0.zip) @@ -226,9 +226,11 @@ The ONNX model can be downloaded here: FastSpeech2 checkpoint contains files listed below. ```text -fastspeech2_nosil_vctk_ckpt_0.5 +fastspeech2_vctk_ckpt_1.2.0 ├── default.yaml # default config used to train fastspeech2 +├── energy_stats.npy # statistics used to normalize energy when training fastspeech2 ├── phone_id_map.txt # phone vocabulary file when training fastspeech2 +├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2 ├── snapshot_iter_66200.pdz # model parameters and optimizer states ├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2 └── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2 @@ -241,9 +243,9 @@ FLAGS_allocator_strategy=naive_best_fit \ FLAGS_fraction_of_gpu_memory_to_use=0.01 \ python3 ${BIN_DIR}/../synthesize_e2e.py \ --am=fastspeech2_vctk \ - --am_config=fastspeech2_nosil_vctk_ckpt_0.5/default.yaml \ - --am_ckpt=fastspeech2_nosil_vctk_ckpt_0.5/snapshot_iter_66200.pdz \ - --am_stat=fastspeech2_nosil_vctk_ckpt_0.5/speech_stats.npy \ + --am_config=fastspeech2_vctk_ckpt_1.2.0/default.yaml \ + --am_ckpt=fastspeech2_vctk_ckpt_1.2.0/snapshot_iter_66200.pdz \ + --am_stat=fastspeech2_vctk_ckpt_1.2.0/speech_stats.npy \ --voc=pwgan_vctk \ --voc_config=pwg_vctk_ckpt_0.1.1/default.yaml \ --voc_ckpt=pwg_vctk_ckpt_0.1.1/snapshot_iter_1500000.pdz \ @@ -251,8 +253,8 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \ --lang=en \ --text=${BIN_DIR}/../sentences_en.txt \ --output_dir=exp/default/test_e2e \ - --phones_dict=fastspeech2_nosil_vctk_ckpt_0.5/phone_id_map.txt \ - --speaker_dict=fastspeech2_nosil_vctk_ckpt_0.5/speaker_id_map.txt \ + --phones_dict=fastspeech2_vctk_ckpt_1.2.0/phone_id_map.txt \ + --speaker_dict=fastspeech2_vctk_ckpt_1.2.0/speaker_id_map.txt \ --spk_id=0 \ --inference_dir=exp/default/inference ``` diff --git a/examples/zh_en_tts/tts3/README.md b/examples/zh_en_tts/tts3/README.md index e7365baa2..b4b683089 100644 --- a/examples/zh_en_tts/tts3/README.md +++ b/examples/zh_en_tts/tts3/README.md @@ -262,9 +262,11 @@ The ONNX model can be downloaded here: FastSpeech2 checkpoint contains files listed below. ```text -fastspeech2_mix_ckpt_0.2.0 +fastspeech2_mix_ckpt_1.2.0 ├── default.yaml # default config used to train fastspeech2 +├── energy_stats.npy # statistics used to energy spectrogram when training fastspeech2 ├── phone_id_map.txt # phone vocabulary file when training fastspeech2 +├── pitch_stats.npy # statistics used to normalize pitch when training fastspeech2 ├── snapshot_iter_99200.pdz # model parameters and optimizer states ├── speaker_id_map.txt # speaker id map file when training a multi-speaker fastspeech2 └── speech_stats.npy # statistics used to normalize spectrogram when training fastspeech2 @@ -281,9 +283,9 @@ FLAGS_allocator_strategy=naive_best_fit \ FLAGS_fraction_of_gpu_memory_to_use=0.01 \ python3 ${BIN_DIR}/../synthesize_e2e.py \ --am=fastspeech2_mix \ - --am_config=fastspeech2_mix_ckpt_0.2.0/default.yaml \ - --am_ckpt=fastspeech2_mix_ckpt_0.2.0/snapshot_iter_99200.pdz \ - --am_stat=fastspeech2_mix_ckpt_0.2.0/speech_stats.npy \ + --am_config=fastspeech2_mix_ckpt_1.2.0/default.yaml \ + --am_ckpt=fastspeech2_mix_ckpt_1.2.0/snapshot_iter_99200.pdz \ + --am_stat=fastspeech2_mix_ckpt_1.2.0/speech_stats.npy \ --voc=pwgan_aishell3 \ --voc_config=pwg_aishell3_ckpt_0.5/default.yaml \ --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ @@ -291,8 +293,8 @@ python3 ${BIN_DIR}/../synthesize_e2e.py \ --lang=mix \ --text=${BIN_DIR}/../sentences_mix.txt \ --output_dir=exp/default/test_e2e \ - --phones_dict=fastspeech2_mix_ckpt_0.2.0/phone_id_map.txt \ - --speaker_dict=fastspeech2_mix_ckpt_0.2.0/speaker_id_map.txt \ + --phones_dict=fastspeech2_mix_ckpt_1.2.0/phone_id_map.txt \ + --speaker_dict=fastspeech2_mix_ckpt_1.2.0/speaker_id_map.txt \ --spk_id=174 \ --inference_dir=exp/default/inference ```