Merge branch 'develop' of github.com:PaddlePaddle/DeepSpeech into fix_docs

3 years ago · 0bc9450c51
parent 0fcc5005a2 5e7e87a354
commit 0bc9450c51
37 changed files with 569 additions and 763 deletions
--- a/README.md
+++ b/README.md
@ -93,17 +93,6 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
  - *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model lists](#models-list) for more details.
  - *Cascaded models application*: as an extension of the application of traditional audio tasks, we combine the workflows of aforementioned tasks with other fields like Natural language processing (NLP), like Punctuation Restoration.

-# Community
-
-You are warmly welcome to submit questions in [discussions](https://github.com/PaddlePaddle/DeepSpeech/discussions) and bug reports in [issues](https://github.com/PaddlePaddle/DeepSpeech/issues)! Also, we highly appreciate if you would like to contribute to this project!
-
-If you are from China, we strongly recommend you join our PaddleSpeech WeChat group. Scan the following WeChat QR code and get in touch with the other developers in this community!
-
-<div align="center">
-<img src="./docs/images/wechat-code-speech.png"  width = "200">
-
-</div>
-
 # Alternative Installation

 The base environment in this page is  
@ -187,7 +176,7 @@ The current hyperlinks redirect to [Previous Parakeet](https://github.com/Paddle
      <td rowspan="2" >Aishell</td>
      <td >DeepSpeech2 RNN + Conv based Models</td>
      <td>
-      <a href = "./examples/aishell/s0">deepspeech2-aishell</a> 
+      <a href = "./examples/aishell/s0">deepspeech2-aishell</a>
      </td>
    </tr>
    <tr>
@ -200,7 +189,7 @@ The current hyperlinks redirect to [Previous Parakeet](https://github.com/Paddle
      <td> Librispeech</td>
      <td>Transformer based Attention Models </td>
      <td>
-      <a href = "./examples/librispeech/s0">deepspeech2-librispeech</a> / <a href = "./examples/librispeech/s1">transformer.conformer.u2-librispeech</a>  / <a href = "./examples/librispeech/s2">transformer.conformer.u2-kaldi-librispeech</a> 
+      <a href = "./examples/librispeech/s0">deepspeech2-librispeech</a> / <a href = "./examples/librispeech/s1">transformer.conformer.u2-librispeech</a>  / <a href = "./examples/librispeech/s2">transformer.conformer.u2-kaldi-librispeech</a>
      </td>
      </td>
    </tr>
@ -223,7 +212,7 @@ The current hyperlinks redirect to [Previous Parakeet](https://github.com/Paddle
      <td>TIMIT</td>
      <td>Unified Streaming & Non-streaming Two-pass</td>
      <td>
-    <a href = "./examples/timit/s1"> u2-timit</a> 
+    <a href = "./examples/timit/s1"> u2-timit</a>
      </td>
    </tr>
  </tbody>
@ -331,10 +320,10 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech) gives you an ove
  - [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html) and [PaddleSpeech VS. Espnet](https://paddlespeech.readthedocs.io/en/latest/tts/demo_2.html)
 - [Released Models](./docs/source/released_model.md)

-The TTS module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with DeepSpeech. If you are interested in academic research about this function, please see [TTS research overview](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://paddleparakeet.readthedocs.io/en/latest/released_models.html) is a good guideline for the pipeline components.
+The TTS module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with DeepSpeech. If you are interested in academic research about this function, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://paddleparakeet.readthedocs.io/en/latest/released_models.html) is a good guideline for the pipeline components.

 # FAQ and Contributing
-You are warmly welcome to submit questions in [discussions](https://github.com/PaddlePaddle/DeepSpeech/discussions) and bug reports in [issues](https://github.com/PaddlePaddle/DeepSpeech/issues)! Also, we highly appreciate if you would like to contribute to this project!
+You are warmly welcome to submit questions in [discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) and bug reports in [issues](https://github.com/PaddlePaddle/PaddleSpeech/issues)! Also, we highly appreciate if you would like to contribute to this project!

 # License and Acknowledgement
 PaddleSpeech is provided under the [Apache-2.0 License](./LICENSE).
@ -347,8 +336,7 @@ To cite PaddleSpeech for research, please use the following format.
@misc{ppspeech2021,
 title={PaddleSpeech, a toolkit for audio processing based on PaddlePaddle.},
 author={PaddlePaddle Authors},
-howpublished = {\url{https://github.com/PaddlePaddle/DeepSpeech}},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}},
 year={2021}
 }
 ```
-
--- a/demos/style_fs2/style_syn.py
+++ b/demos/style_fs2/style_syn.py
@ -81,6 +81,8 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
            durations = durations * d_outs
        elif isinstance(durations, paddle.Tensor):
            durations = durations
+        else:
+            durations = d_outs

        if robot:
            # set normed pitch to zeros have the same effect with set denormd ones to mean
@ -94,6 +96,8 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
            pitch = self.norm(paddle.log(p_HZ), self.pitch_mean, self.pitch_std)
        elif isinstance(pitch, paddle.Tensor):
            pitch = pitch
+        else:
+            pitch = p_outs

        # set energy
        if isinstance(energy, (int, float)):
@ -102,6 +106,8 @@ class StyleFastSpeech2Inference(FastSpeech2Inference):
            energy = self.norm(e_dnorm, self.energy_mean, self.energy_std)
        elif isinstance(energy, paddle.Tensor):
            energy = energy
+        else:
+            energy = e_outs

        normalized_mel, d_outs, p_outs, e_outs = self.acoustic_model.inference(
            text,
--- a/docs/images/wechat-code-speech.png
+++ b/docs/images/wechat-code-speech.png
--- a/docs/source/released_model.md
+++ b/docs/source/released_model.md
@ -1,3 +1,4 @@
+
 # Released Models

 ## Speech-To-Text Models
@ -28,28 +29,29 @@ Language Model | Training Data | Token-based | Size | Descriptions

 ## Text-To-Speech Models
 ### Acoustic Models
-Model Type | Dataset| Example Link | Pretrained Models
-:-------------:| :------------:| :-----: | :-----
-Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)
-TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.4.zip)
-SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)
-FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
-FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)
-FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)
-FastSpeech2| VCTK |[fastspeech2-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_vctk_ckpt_0.5.zip)
-
+Model Type | Dataset| Example Link | Pretrained Models|Static Models|Siize(static)
+:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
+Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)|||
+TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.4.zip)|||
+SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)|[speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_static_0.5.zip)|12M|
+FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_static_0.4.zip)|157M|
+FastSpeech2| AISHELL-3 |[fastspeech2-aishell3](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/tts3)|[fastspeech2_nosil_aishell3_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_aishell3_ckpt_0.4.zip)|||
+FastSpeech2| LJSpeech |[fastspeech2-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts3)|[fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)|||
+FastSpeech2| VCTK |[fastspeech2-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/tts3)|[fastspeech2_nosil_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_vctk_ckpt_0.5.zip)|||

 ### Vocoders

-Model Type | Dataset| Example Link | Pretrained Models
-:-------------:| :------------:| :-----: | :-----
-WaveFlow| LJSpeech |[waveflow-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc0)|[waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_ljspeech_ckpt_0.3.zip)
-Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip.](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip)
-Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_ljspeech_ckpt_0.5.zip)
-Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_vctk_ckpt_0.5.zip)
+Model Type | Dataset| Example Link | Pretrained Models| Static Models|Size(static)
+:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
+WaveFlow| LJSpeech |[waveflow-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc0)|[waveflow_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/waveflow_ljspeech_ckpt_0.3.zip)|||
+Parallel WaveGAN| CSMSC |[PWGAN-csmsc](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/voc1)|[pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip)|[pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_static_0.4.zip)|5.1M|
+Parallel WaveGAN| LJSpeech |[PWGAN-ljspeech](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc1)|[pwg_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_ljspeech_ckpt_0.5.zip)|||
+Parallel WaveGAN|AISHELL-3 |[PWGAN-aishell3](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell3/voc1)|[pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_aishell3_ckpt_0.5.zip)|||
+Parallel WaveGAN| VCTK |[PWGAN-vctk](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/voc1)|[pwg_vctk_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_vctk_ckpt_0.5.zip)|||
+|Multi Band MelGAN |CSMSC|[MB MelGAN-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc3) | [mb_melgan_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/mb_melgan_baker_ckpt_0.5.zip)|[mb_melgan_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/mb_melgan_baker_static_0.5.zip) |8.2M|

 ### Voice Cloning
 Model Type | Dataset| Example Link | Pretrained Models
-:-------------:| :------------:| :-----: | :-----
+:-------------:| :------------:| :-----: | :-----:
 GE2E| AISHELL-3, etc. |[ge2e](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/other/ge2e)|[ge2e_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/ge2e_ckpt_0.3.zip)
 GE2E + Tactron2| AISHELL-3 |[ge2e-tactron2-aishell3](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/vc0)|[tacotron2_aishell3_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_aishell3_ckpt_0.3.zip)
--- a/docs/source/tts/demo.rst
+++ b/docs/source/tts/demo.rst
@ -661,9 +661,11 @@ PaddleSpeech also support Multi-Speaker TTS, we provide the audio demos generate
    <br>
        

-Duration control in FastSpeech2
+Style control in FastSpeech2
 --------------------------------------
-In our FastSpeech2, we can control ``duration``, ``pitch`` and ``energy``, we provide the audio demos of duration control here. ``duration`` means the duration of phonemes, when we reduce duration, the speed of audios will increase, and when we incerase ``duration``, the speed of audios will reduce.
+In our FastSpeech2, we can control ``duration``, ``pitch`` and ``energy``.
+
+We provide the audio demos of duration control here. ``duration`` means the duration of phonemes, when we reduce duration, the speed of audios will increase, and when we incerase ``duration``, the speed of audios will reduce.

 The ``duration`` of different phonemes in a sentence can have different scale ratios (when you want to slow down one word and keep the other words' speed in a sentence). Here we use a fixed scale ratio for different phonemes to control the ``speed`` of audios.

@ -892,6 +894,174 @@ The duration control in FastSpeech2 can control the speed of audios will keep th
    <br>
    <br>

+We provide the audio demos of pitch control here. 
+
+When we set pitch of one sentence to a mean value and set ``tones`` of phones to ``1``, we will get a ``robot-style`` timbre.
+
+When we raise the pitch of an adult female (with a fixed scale ratio), we will get a ``child-style`` timbre.
+
+The ``pitch`` of different phonemes in a sentence can also have different scale ratios.
+
+The nomal audios are in the second column of the previous table.
+
+.. raw:: html
+
+    <div class="table">
+    <table border="2" cellspacing="1" cellpadding="1">
+        <tr>
+            <th align="center"> Robot </th>
+            <th align="center"> Child </th>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/001.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice/001.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/002.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice/002.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/003.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice/003.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/004.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice//004.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/005.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice//005.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/007.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice//007.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/008.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice//008.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+        <tr>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/009.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+            <td>
+                <audio controls="controls">
+                    <source
+                        src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice//009.wav"
+                        type="audio/wav">
+                    Your browser does not support the <code>audio</code> element.
+                </audio>
+            </td>
+        </tr>
+
+    <table>
+    <div>
+    <br>
+    <br>
+

 Chinese TTS with/without text frontend
 --------------------------------------
--- a/docs/tutorial/tts/source/frog_prince.jpg
+++ b/docs/tutorial/tts/source/frog_prince.jpg
--- a/docs/tutorial/tts/tts_tutorial.ipynb
+++ b/docs/tutorial/tts/tts_tutorial.ipynb
--- a/examples/aishell/s1/run.sh
+++ b/examples/aishell/s1/run.sh
@ -2,6 +2,7 @@
 source path.sh
 set -e

+gpus=0,1,2,3
 stage=0
 stop_stage=100
 conf_path=conf/conformer.yaml
@ -22,7 +23,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -45,13 +46,14 @@ if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
    CUDA_VISIBLE_DEVICES=0 ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
 fi

- # Optionally, you can add LM and test it with runtime.
- if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
-    # train lm and build TLG
-    ./local/tlg.sh --corpus aishell --lmtype srilm
- fi
+# Optionally, you can add LM and test it with runtime.
+if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
+    # test a single .wav file
+    CUDA_VISIBLE_DEVICES=0 ./local/test_hub.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
+fi

 if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
-    # test a single .wav file
-    CUDA_VISIBLE_DEVICES=3 ./local/test_hub.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
+    echo "warning: deps on kaldi and srilm, please make sure installed."
+    # train lm and build TLG
+    ./local/tlg.sh --corpus aishell --lmtype srilm
 fi
--- a/examples/aishell3/tts3/README.md
+++ b/examples/aishell3/tts3/README.md
@ -96,17 +96,17 @@ optional arguments:
 6. `--speaker-dict`is the path of the  speaker id map file when training a multi-speaker FastSpeech2.

 ### Synthesize
-We use [parallel wavegan](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/voc1) as the neural vocoder.
-Download pretrained parallel wavegan model from [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip) and unzip it.
+We use [parallel wavegan](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/voc1) as the neural vocoder.
+Download pretrained parallel wavegan model from [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_aishell3_ckpt_0.5.zip) and unzip it.
 ```bash
-unzip pwg_baker_ckpt_0.4.zip
+unzip pwg_aishell3_ckpt_0.5.zip
 ```
 Parallel WaveGAN checkpoint contains files listed below.
 ```text
-pwg_baker_ckpt_0.4
-├── pwg_default.yaml               # default config used to train parallel wavegan
-├── pwg_snapshot_iter_400000.pdz   # model parameters of parallel wavegan
-└── pwg_stats.npy                  # statistics used to normalize spectrogram when training parallel wavegan
+pwg_aishell3_ckpt_0.5
+├── default.yaml                   # default config used to train parallel wavegan
+├── feats_stats.npy                # statistics used to normalize spectrogram when training parallel wavegan
+└── snapshot_iter_1000000.pdz      # generator parameters of parallel wavegan
 ```
 `./local/synthesize.sh` calls `${BIN_DIR}/synthesize.py`, which can synthesize waveform from `metadata.jsonl`.
 ```bash
@ -224,14 +224,12 @@ python3 ${BIN_DIR}/multi_spk_synthesize_e2e.py \
  --fastspeech2-config=fastspeech2_nosil_aishell3_ckpt_0.4/default.yaml \
  --fastspeech2-checkpoint=fastspeech2_nosil_aishell3_ckpt_0.4/snapshot_iter_96400.pdz \
  --fastspeech2-stat=fastspeech2_nosil_aishell3_ckpt_0.4/speech_stats.npy \
-  --pwg-config=pwg_baker_ckpt_0.4/pwg_default.yaml \
-  --pwg-checkpoint=pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz \
-  --pwg-stat=pwg_baker_ckpt_0.4/pwg_stats.npy \
+  --pwg-config=pwg_aishell3_ckpt_0.5/default.yaml \
+  --pwg-checkpoint=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
+  --pwg-stat=pwg_aishell3_ckpt_0.5/feats_stats.npy  \
  --text=${BIN_DIR}/../sentences.txt \
  --output-dir=exp/default/test_e2e \
  --phones-dict=fastspeech2_nosil_aishell3_ckpt_0.4/phone_id_map.txt \
  --speaker-dict=fastspeech2_nosil_aishell3_ckpt_0.4/speaker_id_map.txt

 ```
-## Future work
-A multi-speaker  vocoder is needed.
--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
@ -0,0 +1,146 @@
+# Parallel WaveGAN with AISHELL-3
+This example contains code used to train a [parallel wavegan](http://arxiv.org/abs/1910.11480) model with [AISHELL-3](http://www.aishelltech.com/aishell_3).
+
+AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems.
+## Dataset
+### Download and Extract the datasaet
+Download AISHELL-3.
+```bash
+wget https://www.openslr.org/resources/93/data_aishell3.tgz
+```
+Extract AISHELL-3.
+```bash
+mkdir data_aishell3
+tar zxvf data_aishell3.tgz -C data_aishell3
+```
+### Get MFA result of AISHELL-3 and Extract it
+We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
+You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your own MFA model reference to [use_mfa example](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/other/use_mfa) (use MFA1.x now) of our repo.
+
+## Get Started
+Assume the path to the dataset is `~/datasets/data_aishell3`.
+Assume the path to the MFA result of AISHELL-3 is `./aishell3_alignment_tone`.
+Run the command below to
+1. **source path**.
+2. preprocess the dataset,
+3. train the model.
+4. synthesize wavs.
+    - synthesize waveform from `metadata.jsonl`.
+```bash
+./run.sh
+```
+### Preprocess the dataset
+```bash
+./local/preprocess.sh ${conf_path}
+```
+When it is done. A `dump` folder is created in the current directory. The structure of the dump folder is listed below.
+
+```text
+dump
+├── dev
+│   ├── norm
+│   └── raw
+├── test
+│   ├── norm
+│   └── raw
+└── train
+    ├── norm
+    ├── raw
+    └── feats_stats.npy
+```
+
+The dataset is split into 3 parts, namely `train`, `dev` and `test`, each of which contains a `norm` and `raw` subfolder. The `raw` folder contains log magnitude of mel spectrogram of each utterances, while the norm folder contains normalized spectrogram. The statistics used to normalize the spectrogram is computed from the training set, which is located in `dump/train/feats_stats.npy`.
+
+Also there is a `metadata.jsonl` in each subfolder. It is a table-like file which contains id and paths to spectrogam of each utterance.
+
+### Train the model
+```bash
+CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${train_output_path}
+```
+`./local/train.sh` calls `${BIN_DIR}/train.py`.
+Here's the complete help message.
+
+```text
+usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
+                [--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
+                [--ngpu NGPU] [--verbose VERBOSE] [--batch-size BATCH_SIZE]
+                [--max-iter MAX_ITER] [--run-benchmark RUN_BENCHMARK]
+                [--profiler_options PROFILER_OPTIONS]
+
+Train a ParallelWaveGAN model.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --config CONFIG       config file to overwrite default config.
+  --train-metadata TRAIN_METADATA
+                        training data.
+  --dev-metadata DEV_METADATA
+                        dev data.
+  --output-dir OUTPUT_DIR
+                        output dir.
+  --ngpu NGPU           if ngpu == 0, use cpu.
+  --verbose VERBOSE     verbose.
+
+benchmark:
+  arguments related to benchmark.
+
+  --batch-size BATCH_SIZE
+                        batch size.
+  --max-iter MAX_ITER   train max steps.
+  --run-benchmark RUN_BENCHMARK
+                        runing benchmark or not, if True, use the --batch-size
+                        and --max-iter.
+  --profiler_options PROFILER_OPTIONS
+                        The option of profiler, which should be in format
+                        "key1=value1;key2=value2;key3=value3".
+```
+
+1. `--config` is a config file in yaml format to overwrite the default config, which can be found at `conf/default.yaml`.
+2. `--train-metadata` and `--dev-metadata` should be the metadata file in the normalized subfolder of `train` and `dev` in the `dump` folder.
+3. `--output-dir` is the directory to save the results of the experiment. Checkpoints are save in `checkpoints/` inside this directory.
+4. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
+
+### Synthesize
+`./local/synthesize.sh` calls `${BIN_DIR}/synthesize.py`, which can synthesize waveform from `metadata.jsonl`.
+```bash
+CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name}
+```
+```text
+usage: synthesize.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
+                     [--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
+                     [--ngpu NGPU] [--verbose VERBOSE]
+
+Synthesize with parallel wavegan.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --config CONFIG       parallel wavegan config file.
+  --checkpoint CHECKPOINT
+                        snapshot to load.
+  --test-metadata TEST_METADATA
+                        dev data.
+  --output-dir OUTPUT_DIR
+                        output dir.
+  --ngpu NGPU           if ngpu == 0, use cpu.
+  --verbose VERBOSE     verbose.
+```
+
+1. `--config` parallel wavegan config file. You should use the same config with which the model is trained.
+2. `--checkpoint` is the checkpoint to load. Pick one of the checkpoints from `checkpoints` inside the training output directory. If you use the pretrained model, use the `snapshot_iter_1000000.pdz `.
+3. `--test-metadata` is the metadata of the test dataset. Use the `metadata.jsonl` in the `dev/norm` subfolder from the processed directory.
+4. `--output-dir` is the directory to save the synthesized audio files.
+5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
+
+## Pretrained Models
+Pretrained models can be downloaded here [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_aishell3_ckpt_0.5.zip).
+
+Parallel WaveGAN checkpoint contains files listed below.
+
+```text
+pwg_aishell3_ckpt_0.5
+├── default.yaml                   # default config used to train parallel wavegan
+├── feats_stats.npy                # statistics used to normalize spectrogram when training parallel wavegan
+└── snapshot_iter_1000000.pdz      # generator parameters of parallel wavegan
+```
+## Acknowledgement
+We adapted some code from https://github.com/kan-bayashi/ParallelWaveGAN.
--- a/examples/callcenter/s1/run.sh
+++ b/examples/callcenter/s1/run.sh
@ -2,6 +2,7 @@
 set -e
 source path.sh

+gpus=0,1,2,3
 stage=0
 stop_stage=100
 conf_path=conf/conformer.yaml
@ -20,7 +21,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -30,7 +31,7 @@ fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # test ckpt avg_n
-    CUDA_VISIBLE_DEVICES=4 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
--- a/examples/csmsc/tts2/README.md
+++ b/examples/csmsc/tts2/README.md
@ -209,6 +209,7 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path}

 ## Pretrained Model
 Pretrained SpeedySpeech model with no silence in the edge of audios. [speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)
+Static model can be downloaded here [speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_static_0.5.zip).

 SpeedySpeech checkpoint contains files listed below.
 ```text
--- a/examples/csmsc/tts3/README.md
+++ b/examples/csmsc/tts3/README.md
@ -200,6 +200,7 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path}

 ## Pretrained Model
 Pretrained FastSpeech2 model with no silence in the edge of audios. [fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)
+Static model can be downloaded here [fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_static_0.4.zip)

 FastSpeech2 checkpoint contains files listed below.
 ```text
--- a/examples/csmsc/voc1/README.md
+++ b/examples/csmsc/voc1/README.md
@ -122,7 +122,8 @@ optional arguments:
 5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.

 ## Pretrained Models
-Pretrained models can be downloaded here [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip).
+Pretrained model can be downloaded here [pwg_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip).
+Static models can be downloaded here [pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_static_0.4.zip).

 Parallel WaveGAN checkpoint contains files listed below.

--- a/examples/csmsc/voc3/README.md
+++ b/examples/csmsc/voc3/README.md
@ -85,11 +85,11 @@ usage: synthesize.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
                     [--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
                     [--ngpu NGPU] [--verbose VERBOSE]

-Synthesize with parallel wavegan.
+Synthesize with multi band melgan.

 optional arguments:
  -h, --help            show this help message and exit
-  --config CONFIG       parallel wavegan config file.
+  --config CONFIG       multi band melgan config file.
  --checkpoint CHECKPOINT
                        snapshot to load.
  --test-metadata TEST_METADATA
@ -100,10 +100,23 @@ optional arguments:
  --verbose VERBOSE     verbose.
 ```

-1. `--config` parallel wavegan config file. You should use the same config with which the model is trained.
+1. `--config` multi band melgan config file. You should use the same config with which the model is trained.
 2. `--checkpoint` is the checkpoint to load. Pick one of the checkpoints from `checkpoints` inside the training output directory.
 3. `--test-metadata` is the metadata of the test dataset. Use the `metadata.jsonl` in the `dev/norm` subfolder from the processed directory.
 4. `--output-dir` is the directory to save the synthesized audio files.
 5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.

 ## Pretrained Models
+Pretrained model can be downloaded here [mb_melgan_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/mb_melgan_baker_ckpt_0.5.zip).
+Static model can be downloaded here [mb_melgan_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/mb_melgan_baker_static_0.5.zip)
+
+Multi Band MelGAN checkpoint contains files listed below.
+
+```text
+mb_melgan_baker_ckpt_0.5
+├── default.yaml                  # default config used to train multi band melgan
+├── feats_stats.npy               # statistics used to normalize spectrogram when training multi band melgan
+└── snapshot_iter_1000000.pdz     # generator parameters of multi band melgan
+```
+## Acknowledgement
+We adapted some code from https://github.com/kan-bayashi/ParallelWaveGAN.
--- a/examples/librispeech/s0/run.sh
+++ b/examples/librispeech/s0/run.sh
@ -2,6 +2,7 @@
 set -e
 source path.sh

+gpus=0,1,2,3,4,5,6,7
 stage=0
 stop_stage=100
 conf_path=conf/deepspeech2.yaml
@ -21,7 +22,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./local/train.sh ${conf_path}  ${ckpt} ${model_type}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt} ${model_type}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -31,7 +32,7 @@ fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # test ckpt avg_n
-    CUDA_VISIBLE_DEVICES=7 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type} || exit -1
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
@ -41,5 +42,5 @@ fi

 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
    # test a single .wav file
-    CUDA_VISIBLE_DEVICES=3 ./local/test_hub.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type} ${audio_file} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test_hub.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${model_type} ${audio_file} || exit -1
 fi
--- a/examples/librispeech/s1/run.sh
+++ b/examples/librispeech/s1/run.sh
@ -4,6 +4,7 @@ set -e
 . ./path.sh || exit 1;
 . ./cmd.sh || exit 1;

+gpus=0,1,2,3
 stage=0
 stop_stage=100
 conf_path=conf/transformer.yaml
@ -24,7 +25,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -49,5 +50,5 @@ fi

 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
    # test a single .wav file
-    CUDA_VISIBLE_DEVICES=3 ./local/test_hub.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test_hub.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} ${audio_file} || exit -1
 fi
--- a/examples/librispeech/s2/run.sh
+++ b/examples/librispeech/s2/run.sh
@ -5,6 +5,7 @@ set -e
 . ./path.sh || exit 1;
 . ./cmd.sh || exit 1;

+gpus=0,1,2,3,4,5,6,7
 stage=0
 stop_stage=100
 conf_path=conf/transformer.yaml
@ -24,7 +25,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -49,9 +50,9 @@ fi

 if [ ${stage} -le 6 ] && [ ${stop_stage} -ge 6 ]; then
    # export ckpt avg_n
-    CUDA_VISIBLE_DEVICES= ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
+    ./local/export.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} exp/${ckpt}/checkpoints/${avg_ckpt}.jit
 fi

 if [ ${stage} -le 7 ] && [ ${stop_stage} -ge 7 ]; then
-    CUDA_VISIBLE_DEVICES= ./local/cacu_perplexity.sh || exit -1
+    ./local/cacu_perplexity.sh || exit -1
 fi
--- a/examples/ted_en_zh/t0/conf/transformer_joint_noam.yaml
+++ b/examples/ted_en_zh/t0/conf/transformer_joint_noam.yaml
@ -99,6 +99,7 @@ decoding:
  alpha: 2.5
  beta: 0.3
  beam_size: 10
+  word_reward: 0.7
  cutoff_prob: 1.0
  cutoff_top_n: 0
  num_proc_bsearch: 8
--- a/examples/ted_en_zh/t0/run.sh
+++ b/examples/ted_en_zh/t0/run.sh
@ -2,6 +2,7 @@
 set -e
 source path.sh

+gpus=0,1,2,3
 stage=0
 stop_stage=100
 conf_path=conf/transformer_joint_noam.yaml
@ -21,7 +22,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
--- a/examples/timit/s1/run.sh
+++ b/examples/timit/s1/run.sh
@ -3,6 +3,7 @@ set -e

 . path.sh || exit 1;

+gpus=0,1,2,3
 stage=0
 stop_stage=50
 conf_path=conf/transformer.yaml
@ -23,7 +24,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    CUDA_VISIBLE_DEVICES=0,1,2,3 ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -33,7 +34,7 @@ fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # test ckpt avg_n
-    CUDA_VISIBLE_DEVICES=7 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+    CUDA_VISIBLE_DEVICES=0 ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
--- a/examples/tiny/s1/run.sh
+++ b/examples/tiny/s1/run.sh
@ -2,6 +2,7 @@
 set -e
 source path.sh

+gpus=0
 stage=0
 stop_stage=100
 conf_path=conf/transformer.yaml
@ -20,7 +21,7 @@ fi

 if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
    # train model, all `ckpt` under `exp` dir
-    ./local/train.sh ${conf_path}  ${ckpt}
+    CUDA_VISIBLE_DEVICES=${gpus}  ./local/train.sh ${conf_path}  ${ckpt}
 fi

 if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
@ -30,12 +31,12 @@ fi

 if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
    # test ckpt avg_n
-    CUDA_VISIBLE_DEVICES= ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/test.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
 fi

 if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
    # ctc alignment of test data
-    CUDA_VISIBLE_DEVICES= ./local/align.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
+    CUDA_VISIBLE_DEVICES=${gpus} ./local/align.sh ${conf_path} exp/${ckpt}/checkpoints/${avg_ckpt} || exit -1
 fi

 if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then
--- a/paddlespeech/s2t/exps/u2/model.py
+++ b/paddlespeech/s2t/exps/u2/model.py
@ -206,12 +206,14 @@ class U2Trainer(Trainer):
                        observation['batch_cost'] = observation[
                            'reader_cost'] + observation['step_cost']
                        observation['samples'] = observation['batch_size']
-                        observation['ips[sent./sec]'] = observation[
+                        observation['ips,sent./sec'] = observation[
                            'batch_size'] / observation['batch_cost']
                        for k, v in observation.items():
-                            msg += f" {k}: "
+                            msg += f" {k.split(',')[0]}: "
                            msg += f"{v:>.8f}" if isinstance(v,
                                                             float) else f"{v}"
+                            msg += f" {k.split(',')[1]}" if len(
+                                k.split(',')) == 2 else f""
                            msg += ","
                        msg = msg[:-1]  # remove the last ","
                        if (batch_index + 1
--- a/paddlespeech/s2t/exps/u2_st/model.py
+++ b/paddlespeech/s2t/exps/u2_st/model.py
@ -441,10 +441,7 @@ class U2STTester(U2STTrainer):
            "".join(chr(t) for t in text[:text_len])
            for text, text_len in zip(texts, texts_len)
        ]
-        # from IPython import embed
-        # import os
-        # embed()
-        # os._exit(0)
+
        hyps = self.model.decode(
            audio,
            audio_len,
@ -458,6 +455,7 @@ class U2STTester(U2STTrainer):
            cutoff_top_n=cfg.cutoff_top_n,
            num_processes=cfg.num_proc_bsearch,
            ctc_weight=cfg.ctc_weight,
+            word_reward=cfg.word_reward,
            decoding_chunk_size=cfg.decoding_chunk_size,
            num_decoding_left_chunks=cfg.num_decoding_left_chunks,
            simulate_streaming=cfg.simulate_streaming)
--- a/paddlespeech/s2t/models/u2_st/u2_st.py
+++ b/paddlespeech/s2t/models/u2_st/u2_st.py
@ -315,6 +315,7 @@ class U2STBaseModel(nn.Layer):
            speech: paddle.Tensor,
            speech_lengths: paddle.Tensor,
            beam_size: int=10,
+            word_reward: float=0.0,
            decoding_chunk_size: int=-1,
            num_decoding_left_chunks: int=-1,
            simulate_streaming: bool=False, ) -> paddle.Tensor:
@ -378,6 +379,7 @@ class U2STBaseModel(nn.Layer):

            # 2.2 First beam prune: select topk best prob at current time
            top_k_logp, top_k_index = logp.topk(beam_size)  # (B*N, N)
+            top_k_logp += word_reward
            top_k_logp = mask_finished_scores(top_k_logp, end_flag)
            top_k_index = mask_finished_preds(top_k_index, end_flag, self.eos)

@ -528,6 +530,7 @@ class U2STBaseModel(nn.Layer):
               cutoff_top_n: int,
               num_processes: int,
               ctc_weight: float=0.0,
+               word_reward: float=0.0,
               decoding_chunk_size: int=-1,
               num_decoding_left_chunks: int=-1,
               simulate_streaming: bool=False):
@ -569,6 +572,7 @@ class U2STBaseModel(nn.Layer):
                feats,
                feats_lengths,
                beam_size=beam_size,
+                word_reward=word_reward,
                decoding_chunk_size=decoding_chunk_size,
                num_decoding_left_chunks=num_decoding_left_chunks,
                simulate_streaming=simulate_streaming)
--- a/paddlespeech/t2s/exps/fastspeech2/multi_spk_synthesize_e2e.py
+++ b/paddlespeech/t2s/exps/fastspeech2/multi_spk_synthesize_e2e.py
@ -87,26 +87,27 @@ def evaluate(args, fastspeech2_config, pwg_config):
    output_dir = Path(args.output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    # only test the number 0 speaker
-    spk_id = 0
-    for utt_id, sentence in sentences:
-        input_ids = frontend.get_input_ids(sentence, merge_sentences=True)
-        phone_ids = input_ids["phone_ids"]
-        flags = 0
-        for part_phone_ids in phone_ids:
-            with paddle.no_grad():
-                mel = fastspeech2_inference(
-                    part_phone_ids, spk_id=paddle.to_tensor(spk_id))
-                temp_wav = pwg_inference(mel)
-            if flags == 0:
-                wav = temp_wav
-                flags = 1
-            else:
-                wav = paddle.concat([wav, temp_wav])
-        sf.write(
-            str(output_dir / (str(spk_id) + "_" + utt_id + ".wav")),
-            wav.numpy(),
-            samplerate=fastspeech2_config.fs)
-        print(f"{spk_id}_{utt_id} done!")
+    spk_ids = list(range(20))
+    for spk_id in spk_ids:
+        for utt_id, sentence in sentences[:2]:
+            input_ids = frontend.get_input_ids(sentence, merge_sentences=True)
+            phone_ids = input_ids["phone_ids"]
+            flags = 0
+            for part_phone_ids in phone_ids:
+                with paddle.no_grad():
+                    mel = fastspeech2_inference(
+                        part_phone_ids, spk_id=paddle.to_tensor(spk_id))
+                    temp_wav = pwg_inference(mel)
+                if flags == 0:
+                    wav = temp_wav
+                    flags = 1
+                else:
+                    wav = paddle.concat([wav, temp_wav])
+            sf.write(
+                str(output_dir / (str(spk_id) + "_" + utt_id + ".wav")),
+                wav.numpy(),
+                samplerate=fastspeech2_config.fs)
+            print(f"{spk_id}_{utt_id} done!")


 def main():
--- a/paddlespeech/t2s/exps/gan_vocoder/multi_band_melgan/synthesize.py
+++ b/paddlespeech/t2s/exps/gan_vocoder/multi_band_melgan/synthesize.py
@ -30,9 +30,9 @@ from paddlespeech.t2s.models.melgan import MelGANGenerator

 def main():
    parser = argparse.ArgumentParser(
-        description="Synthesize with parallel wavegan.")
+        description="Synthesize with multi band melgan.")
    parser.add_argument(
-        "--config", type=str, help="parallel wavegan config file.")
+        "--config", type=str, help="multi band melgan config file.")
    parser.add_argument("--checkpoint", type=str, help="snapshot to load.")
    parser.add_argument("--test-metadata", type=str, help="dev data.")
    parser.add_argument("--output-dir", type=str, help="output dir.")
--- a/requirements.txt
+++ b/requirements.txt
@ -33,6 +33,7 @@ sentencepiece
 snakeviz
 soundfile~=0.10
 sox
+soxbindings
 tensorboardX
 textgrid
 timer
--- a/tests/benchmark/conformer/run.sh
+++ b/tests/benchmark/conformer/run.sh
@ -10,8 +10,10 @@ pushd ../../../examples/aishell/s1


 source path.sh
+source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
+
 fp_item_list=(fp32)
-bs_item=(16)
+bs_item=(16 30)
 config_path=conf/conformer.yaml
 seed=0
 output=exp/conformer
@ -34,7 +36,4 @@ done

 popd

-mkdir -p log
-bash run_analysis_sp.sh > log/log_sp.out
-bash run_analysis_mp.sh > log/log_mp.out

--- a/tests/benchmark/conformer/run_analysis_mp.sh
+++ b/tests/benchmark/conformer/run_analysis_mp.sh
@ -1,12 +0,0 @@
-python analysis.py \
-    --filename "recoder_mp_bs16_fp32_ngpu8.txt" \
-    --keyword "ips[sent./sec]:" \
-    --base_batch_size 16 \
-    --model_name "Conformer" \
-    --mission_name "eight gpu" \
-    --run_mode "mp" \
-    --ips_unit "sent./sec" \
-    --gpu_num 8 \
-    --use_num 480 \
-    --separator " " \
-    --direction_id "1"
--- a/tests/benchmark/conformer/run_analysis_sp.sh
+++ b/tests/benchmark/conformer/run_analysis_sp.sh
@ -1,12 +0,0 @@
-python analysis.py \
-    --filename "recoder_sp_bs16_fp32_ngpu1.txt" \
-    --keyword "ips[sent./sec]:" \
-    --base_batch_size 16 \
-    --model_name "Conformer" \
-    --mission_name "one gpu" \
-    --run_mode "sp" \
-    --ips_unit "sent./sec" \
-    --gpu_num 1 \
-    --use_num 60 \
-    --separator " " \
-    --direction_id "1"
--- a/tools/Makefile
+++ b/tools/Makefile
@ -51,7 +51,7 @@ soxbindings.done:
 	touch soxbindings.done

 mfa.done:
-	test -d montreal-forced-aligner || $(WGET) https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases/download/v1.0.1/montreal-forced-aligner_linux.tar.gz
+	test -d montreal-forced-aligner || $(WGET) https://paddlespeech.bj.bcebos.com/Parakeet/montreal-forced-aligner_linux.tar.gz
 	tar xvf montreal-forced-aligner_linux.tar.gz
 	touch mfa.done

--- a/tools/extras/install_mfa_v1.sh
+++ b/tools/extras/install_mfa_v1.sh
@ -0,0 +1,4 @@
+#!/bin/bash 
+
+test -d montreal-forced-aligner || wget --no-check-certificate https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases/download/v1.0.1/montreal-forced-aligner_linux.tar.gz
+tar xvf montreal-forced-aligner_linux.tar.gz
--- a/tools/extras/install_mfa_v2.sh
+++ b/tools/extras/install_mfa_v2.sh
--- a/tools/extras/install_sclite.sh
+++ b/tools/extras/install_sclite.sh
@ -0,0 +1,26 @@
+#!/bin/bash
+
+WGET="wget --no-check-certificate"
+
+# SCTK official repo does not have version tags. Here's the mapping:
+# # 2.4.9 = 659bc36; 2.4.10 = d914e1b; 2.4.11 = 20159b5.
+SCTK_GITHASH=20159b5
+SCTK_CXFLAGS="-w -march=native"
+CFLAGS="CFLAGS=${SCTK_CXFLAGS}"
+CXXFLAGS="CXXFLAGS=-std=c++11 ${SCTK_CXFLAGS}"
+
+MAKE=make
+
+
+${WGET} -nv -T 10 -t 3 -O sctk-${SCTK_GITHASH}.tar.gz  https://github.com/usnistgov/SCTK/archive/${SCTK_GITHASH}.tar.gz; 
+tar zxvf sctk-${SCTK_GITHASH}.tar.gz
+rm -rf sctk-${SCTK_GITHASH} sctk
+mv SCTK-${SCTK_GITHASH}* sctk-${SCTK_GITHASH}
+ln -s sctk-${SCTK_GITHASH} sctk
+touch sctk-${SCTK_GITHASH}.tar.gz
+
+rm -f sctk/.compiled
+CFLAGS="${SCTK_CXFLAGS}" CXXFLAGS="-std=c++11 ${SCTK_CXFLAGS}" ${MAKE} -C sctk config
+CFLAGS="${SCTK_CXFLAGS}" CXXFLAGS="-std=c++11 ${SCTK_CXFLAGS}"  ${MAKE} -C sctk all doc
+${MAKE} -C sctk install
+touch sctk/.compiled
--- a/tools/extras/install_sox.sh
+++ b/tools/extras/install_sox.sh
@ -0,0 +1,6 @@
+#!/bin/bash 
+
+apt install -y libvorbis-dev libmp3lame-dev libmad-ocaml-dev
+test -d sox-14.4.2 || wget --no-check-certificate https://nchc.dl.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.gz
+tar -xvzf sox-14.4.2.tar.gz -C .
+cd sox-14.4.2 && ./configure --prefix=/usr/ && make -j4 && make install
--- a/tools/extras/install_venv.sh
+++ b/tools/extras/install_venv.sh
@ -0,0 +1,36 @@
+
+#!/bin/bash
+# copy from Espnet
+
+set -euo pipefail
+
+if [ $# -eq 0 ] || [ $# -gt 2 ]; then
+    echo "Usage: $0 <python> [venv-path]"
+    echo "e.g."
+    echo "$0 \$(which python3)"
+    exit 1;
+elif [ $# -eq 2 ]; then
+    PYTHON="$1"
+    VENV="$2"
+elif [ $# -eq 1 ]; then
+    PYTHON="$1"
+    VENV="venv"
+fi
+
+if ! "${PYTHON}" -m venv --help > /dev/null 2>&1; then
+    echo "Error: ${PYTHON} is not Python3?"
+    exit 1
+fi
+if [ -e activate_python.sh ]; then
+    echo "Warning: activate_python.sh already exists. It will be overwritten"
+fi
+
+"${PYTHON}" -m venv ${VENV}
+cat << EOF > activate_python.sh
+#!/usr/bin/env bash
+# THIS FILE IS GENERATED BY tools/setup_venv.sh
+. $(cd ${VENV}; pwd)/bin/activate
+EOF
+
+. ./activate_python.sh
+python3 -m pip install -U pip wheel