Merge branch 'PaddlePaddle:develop' into develop

3 years ago · 832ff0e6aa
parent d7e293133b d622b8bc5f
commit 832ff0e6aa
9 changed files with 27 additions and 15 deletions
--- a/README.md
+++ b/README.md
@ -157,6 +157,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
  - 🧩  *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
 ### Recent Update
 - 👑 2022.11.01: Add [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) for [Chinese English mixed TTS](./examples/zh_en_tts/tts3).
 - 🔥 2022.10.26: Add [Prosody Prediction](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy) for TTS.
 - 🎉 2022.10.21: Add [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) for TTS Chinese Text Frontend.
 - 👑 2022.10.11: Add [Wav2vec2ASR](./examples/librispeech/asr3), wav2vec2.0 fine-tuning for ASR on LibriSpeech.
@ -716,9 +717,9 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
  <tr>
      <td>Keyword Spotting</td>
      <td>hey-snips</td>
-      <td>PANN</td>
+      <td>MDTC</td>
      <td>
-      <a href = "./examples/hey_snips/kws0">pann-hey-snips</a>
+      <a href = "./examples/hey_snips/kws0">mdtc-hey-snips</a>
      </td>
    </tr>
  </tbody>
--- a/README_cn.md
+++ b/README_cn.md
@ -164,6 +164,7 @@
 ### 近期更新
 - 👑 2022.11.01: [中英文混合 TTS](./examples/zh_en_tts/tts3) 新增 [Adversarial Loss](https://arxiv.org/pdf/1907.04448.pdf) 模块。
 - 🔥 2022.10.26: TTS 新增[韵律预测](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/rhy)功能。
 - 🎉 2022.10.21: TTS 中文文本前端新增 [SSML](https://github.com/PaddlePaddle/PaddleSpeech/discussions/2538) 功能。
 - 👑 2022.10.11: 新增 [Wav2vec2ASR](./examples/librispeech/asr3), 在 LibriSpeech 上针对 ASR 任务对 wav2vec2.0 的 finetuning。
@ -696,9 +697,9 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
 </table>
-<a name="唤醒模型"></a>
+<a name="语音唤醒模型"></a>
-**唤醒**
+**语音唤醒**
 <table style="width:100%">
  <thead>
@ -711,11 +712,11 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块：文本前端、声
  </thead>
  <tbody>
  <tr>
-      <td>唤醒</td>
+      <td>语音唤醒</td>
      <td>hey-snips</td>
-      <td>PANN</td>
+      <td>MDTC</td>
      <td>
-      <a href = "./examples/hey_snips/kws0">pann-hey-snips</a>
+      <a href = "./examples/hey_snips/kws0">mdtc-hey-snips</a>
      </td>
    </tr>
  </tbody>
--- a/docs/source/cls/custom_dataset.md
+++ b/docs/source/cls/custom_dataset.md
@ -108,7 +108,7 @@ for epoch in range(1, epochs + 1):
        optimizer.clear_grad()
        # Calculate loss
-        avg_loss = loss.numpy()[0]
+        avg_loss = float(loss)
        # Calculate metrics
        preds = paddle.argmax(logits, axis=1)
--- a/docs/tutorial/cls/cls_tutorial.ipynb
+++ b/docs/tutorial/cls/cls_tutorial.ipynb
@ -509,7 +509,7 @@
    "        optimizer.clear_grad()\n",
    "\n",
    "        # Calculate loss\n",
-    "        avg_loss += loss.numpy()[0]\n",
+    "        avg_loss += float(loss)\n",
    "\n",
    "        # Calculate metrics\n",
    "        preds = paddle.argmax(logits, axis=1)\n",
--- a/examples/other/tts_finetune/tts3/README.md
+++ b/examples/other/tts_finetune/tts3/README.md
@ -55,7 +55,7 @@ If you want to finetune Chinese pretrained model, you need to prepare Chinese da
 000001|ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1
 ```
-Here is an example of the first 200 data of csmsc.
+Here is a Chinese data example of the first 200 data of csmsc.
 ```bash
 mkdir -p input && cd input
@ -69,7 +69,7 @@ If you want to finetune English pretrained model, you need to prepare English da
 LJ001-0001|Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition
 ```
-Here is an example of the first 200 data of ljspeech.
+Here is an English data example of the first 200 data of ljspeech.
 ```bash
 mkdir -p input && cd input
@ -78,7 +78,7 @@ unzip ljspeech_mini.zip
 cd ../
 ```
-If you want to finetune Chinese-English Mixed pretrained model, you need to prepare Chinese data or English data. Here is an example of the first 12 data of SSB0005 (the speaker of aishell3).
+If you want to finetune Chinese-English Mixed pretrained model, you need to prepare Chinese data or English data. Here is a Chinese data example of the first 12 data of SSB0005 (the speaker of aishell3).
 ```bash
 mkdir -p input && cd input
--- a/paddlespeech/cls/exps/panns/train.py
+++ b/paddlespeech/cls/exps/panns/train.py
@ -101,7 +101,7 @@ if __name__ == "__main__":
            optimizer.clear_grad()
            # Calculate loss
-            avg_loss += loss.numpy()[0]
+            avg_loss += float(loss)
            # Calculate metrics
            preds = paddle.argmax(logits, axis=1)
--- a/paddlespeech/kws/exps/mdtc/train.py
+++ b/paddlespeech/kws/exps/mdtc/train.py
@ -110,7 +110,7 @@ if __name__ == '__main__':
            optimizer.clear_grad()
            # Calculate loss
-            avg_loss += loss.numpy()[0]
+            avg_loss += float(loss)
            # Calculate metrics
            num_corrects += corrects
--- a/paddlespeech/t2s/exps/sentences_ssml.txt
+++ b/paddlespeech/t2s/exps/sentences_ssml.txt
@ -0,0 +1,10 @@
 0001 考古人员<speak>西<say-as pinyin='zang4'>藏</say-as>布达拉宫里发现一个被隐<say-as pinyin="cang2">藏</say-as>的装有宝<say-as pinyin="zang4">藏</say-as></speak>箱子。
 0002 <speak>有人询问中国银<say-as pinyin='hang2'>行</say-as>北京分<say-as pinyin='hang2 hang2'>行行</say-as>长是否叫任我<say-as pinyin='xing2'>行</say-as></speak>。
 0003 <speak>市委书记亲自<say-as pinyin='shuai4'>率</say-as>领审计员对这家公司进行财务审计，发现企业的利润<say-as pinyin='lv4'>率</say-as>数据虚假</speak>。
 0004 <speak>学生们对代<say-as pinyin='shu4'>数</say-as>理解不深刻，特别是小<say-as pinyin='shu4'>数</say-as>点，在<say-as pinyin='shu3 shu4'>数数</say-as>时容易弄错</speak>。
 0005 <speak>赵<say-as pinyin='chang2'>长</say-as>军从小学习武术，擅<say-as pinyin='chang2'>长</say-as>散打，<say-as pinyin='zhang3'>长</say-as>大后参军，担任连<say-as pinyin='zhang3'>长</say-as></speak>。
 0006 <speak>我说她<say-as pinyin='zhang3'>涨</say-as>了工资，她就<say-as pinyin='zhang4'>涨</say-as>红着脸，摇头否认</speak>。
 0007 <speak>请把这封信交<say-as pinyin='gei3'>给</say-as>团长，告诉他，前线的供<say-as pinyin='ji3'>给</say-as>一定要有保障</speak>。
 0008 <speak>矿下的<say-as pinyin='hang4'>巷</say-as>道，与北京四合院的小<say-as pinyin='xiang4'>巷</say-as>有点相似</speak>。
 0009 <speak>他常叹自己命<say-as pinyin='bo2'>薄</say-as>,几亩<say-as pinyin='bao2'>薄</say-as>田，种点<say-as pinyin='bo4'>薄</say-as>荷</speak>。
 0010 <speak>小明对天相很有研究，在<say-as pinyin='su4'>宿</say-as>舍说了一<say-as pinyin='xiu3'>宿</say-as>有关星<say-as pinyin='xiu4'>宿</say-as>的常识</speak>。
--- a/tests/unit/tts/test_pwg.py
+++ b/tests/unit/tts/test_pwg.py
@ -13,6 +13,7 @@
 # limitations under the License.
 import paddle
 import torch
 from paddle.device.cuda import synchronize
 from parallel_wavegan.layers import residual_block
 from parallel_wavegan.layers import upsample
 from parallel_wavegan.models import parallel_wavegan as pwgan
@ -24,7 +25,6 @@ from paddlespeech.t2s.models.parallel_wavegan import PWGGenerator
 from paddlespeech.t2s.models.parallel_wavegan import ResidualBlock
 from paddlespeech.t2s.models.parallel_wavegan import ResidualPWGDiscriminator
 from paddlespeech.t2s.utils.layer_tools import summary
 from paddlespeech.t2s.utils.profile import synchronize
 paddle.set_device("gpu:0")
 device = torch.device("cuda:0")