diff --git a/README.md b/README.md index 9d7ed4258..d0928f6c7 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ### Recent Update +- 👑 2022.11.01: Add [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) for [Chinese English mixed TTS](./examples/zh_en_tts/tts3). - 🔥 2022.10.26: Add [Prosody Prediction](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy) for TTS. - 🎉 2022.10.21: Add [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) for TTS Chinese Text Frontend. - 👑 2022.10.11: Add [Wav2vec2ASR](./examples/librispeech/asr3), wav2vec2.0 fine-tuning for ASR on LibriSpeech. @@ -716,9 +717,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r Keyword Spotting hey-snips - PANN + MDTC - pann-hey-snips + mdtc-hey-snips diff --git a/README_cn.md b/README_cn.md index 2db883b5a..4cfc9715f 100644 --- a/README_cn.md +++ b/README_cn.md @@ -164,6 +164,7 @@ ### 近期更新 +- 👑 2022.11.01: [中英文混合 TTS](./examples/zh_en_tts/tts3) 新增 [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) 模块。 - 🔥 2022.10.26: TTS 新增[韵律预测](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy)功能。 - 🎉 2022.10.21: TTS 中文文本前端新增 [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) 功能。 - 👑 2022.10.11: 新增 [Wav2vec2ASR](./examples/librispeech/asr3), 在 LibriSpeech 上针对 ASR 任务对 wav2vec2.0 的 finetuning。 @@ -696,9 +697,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 - + -**唤醒** +**语音唤醒** @@ -711,11 +712,11 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 - + - + diff --git a/docs/source/cls/custom_dataset.md b/docs/source/cls/custom_dataset.md index e39dcf12d..b7c06cd7a 100644 --- a/docs/source/cls/custom_dataset.md +++ b/docs/source/cls/custom_dataset.md @@ -108,7 +108,7 @@ for epoch in range(1, epochs + 1): optimizer.clear_grad() # Calculate loss - avg_loss = loss.numpy()[0] + avg_loss = float(loss) # Calculate metrics preds = paddle.argmax(logits, axis=1) diff --git a/docs/tutorial/cls/cls_tutorial.ipynb b/docs/tutorial/cls/cls_tutorial.ipynb index 56b488adc..3cee64991 100644 --- a/docs/tutorial/cls/cls_tutorial.ipynb +++ b/docs/tutorial/cls/cls_tutorial.ipynb @@ -509,7 +509,7 @@ " optimizer.clear_grad()\n", "\n", " # Calculate loss\n", - " avg_loss += loss.numpy()[0]\n", + " avg_loss += float(loss)\n", "\n", " # Calculate metrics\n", " preds = paddle.argmax(logits, axis=1)\n", diff --git a/examples/other/tts_finetune/tts3/README.md b/examples/other/tts_finetune/tts3/README.md index fa691764c..8564af5f6 100644 --- a/examples/other/tts_finetune/tts3/README.md +++ b/examples/other/tts_finetune/tts3/README.md @@ -55,7 +55,7 @@ If you want to finetune Chinese pretrained model, you need to prepare Chinese da 000001|ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1 ``` -Here is an example of the first 200 data of csmsc. +Here is a Chinese data example of the first 200 data of csmsc. ```bash mkdir -p input && cd input @@ -69,7 +69,7 @@ If you want to finetune English pretrained model, you need to prepare English da LJ001-0001|Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition ``` -Here is an example of the first 200 data of ljspeech. +Here is an English data example of the first 200 data of ljspeech. ```bash mkdir -p input && cd input @@ -78,7 +78,7 @@ unzip ljspeech_mini.zip cd ../ ``` -If you want to finetune Chinese-English Mixed pretrained model, you need to prepare Chinese data or English data. Here is an example of the first 12 data of SSB0005 (the speaker of aishell3). +If you want to finetune Chinese-English Mixed pretrained model, you need to prepare Chinese data or English data. Here is a Chinese data example of the first 12 data of SSB0005 (the speaker of aishell3). ```bash mkdir -p input && cd input diff --git a/paddlespeech/cls/exps/panns/train.py b/paddlespeech/cls/exps/panns/train.py index fba38a01c..133893081 100644 --- a/paddlespeech/cls/exps/panns/train.py +++ b/paddlespeech/cls/exps/panns/train.py @@ -101,7 +101,7 @@ if __name__ == "__main__": optimizer.clear_grad() # Calculate loss - avg_loss += loss.numpy()[0] + avg_loss += float(loss) # Calculate metrics preds = paddle.argmax(logits, axis=1) diff --git a/paddlespeech/kws/exps/mdtc/train.py b/paddlespeech/kws/exps/mdtc/train.py index 94e45d590..d5bb5e020 100644 --- a/paddlespeech/kws/exps/mdtc/train.py +++ b/paddlespeech/kws/exps/mdtc/train.py @@ -110,7 +110,7 @@ if __name__ == '__main__': optimizer.clear_grad() # Calculate loss - avg_loss += loss.numpy()[0] + avg_loss += float(loss) # Calculate metrics num_corrects += corrects diff --git a/paddlespeech/t2s/exps/sentences_ssml.txt b/paddlespeech/t2s/exps/sentences_ssml.txt new file mode 100644 index 000000000..e3614f224 --- /dev/null +++ b/paddlespeech/t2s/exps/sentences_ssml.txt @@ -0,0 +1,10 @@ +0001 考古人员西布达拉宫里发现一个被隐的装有宝箱子。 +0002 有人询问中国银北京分行行长是否叫任我。 +0003 市委书记亲自领审计员对这家公司进行财务审计,发现企业的利润数据虚假。 +0004 学生们对代理解不深刻,特别是小点,在数数时容易弄错。 +0005 军从小学习武术,擅散打,大后参军,担任连。 +0006 我说她了工资,她就红着脸,摇头否认。 +0007 请把这封信交团长,告诉他,前线的供一定要有保障。 +0008 矿下的道,与北京四合院的小有点相似。 +0009 他常叹自己命,几亩田,种点。 +0010 小明对天相很有研究,在宿舍说了一宿有关星宿的常识。 \ No newline at end of file diff --git a/tests/unit/tts/test_pwg.py b/tests/unit/tts/test_pwg.py index 78cb34f25..10c82c9fd 100644 --- a/tests/unit/tts/test_pwg.py +++ b/tests/unit/tts/test_pwg.py @@ -13,6 +13,7 @@ # limitations under the License. import paddle import torch +from paddle.device.cuda import synchronize from parallel_wavegan.layers import residual_block from parallel_wavegan.layers import upsample from parallel_wavegan.models import parallel_wavegan as pwgan @@ -24,7 +25,6 @@ from paddlespeech.t2s.models.parallel_wavegan import PWGGenerator from paddlespeech.t2s.models.parallel_wavegan import ResidualBlock from paddlespeech.t2s.models.parallel_wavegan import ResidualPWGDiscriminator from paddlespeech.t2s.utils.layer_tools import summary -from paddlespeech.t2s.utils.profile import synchronize paddle.set_device("gpu:0") device = torch.device("cuda:0")
唤醒语音唤醒 hey-snipsPANNMDTC - pann-hey-snips + mdtc-hey-snips