Merge pull request #1031 from yt605155624/fix_docs

[TTS]update ipynb, add eval loss
3 years ago · 563568a2b8
parent c3f847bd2d 7d3985bff9
commit 563568a2b8
13 changed files with 935 additions and 285 deletions
--- a/README.md
+++ b/README.md
@ -124,7 +124,7 @@ avg.sh best exp/deepspeech2/checkpoints 1
 ./local/test.sh conf/deepspeech2.yaml exp/deepspeech2/checkpoints/avg_1 offline
 ```

-For **Text-To-Speech**, try pretrained FastSpeech2 + Parallel WaveGAN on CSMSC:
+For **Text-to-Speech**, try pretrained FastSpeech2 + Parallel WaveGAN on CSMSC:
 ```shell
 cd examples/csmsc/tts3
 # download the pretrained models and unaip them
@ -150,7 +150,7 @@ python3 ${BIN_DIR}/synthesize_e2e.py \
  --phones-dict=fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt
 ```

-If you want to try more functions like training and tuning, please see [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-To-Speech Quick Start](./docs/source/tts/quick_start.md).
+If you want to try more functions like training and tuning, please see [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).

 ## Model List

--- a/docs/source/introduction.md
+++ b/docs/source/introduction.md
@ -50,7 +50,7 @@ PaddleSpeech TTS provides you with a complete TTS pipeline, including:
    - Parallel WaveGAN
    - WaveFlow
 - Voice Cloning
-    - Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
+    - Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis
    - GE2E

 Text-to-Speech  helps you to train TTS models with simple commands.
--- a/docs/source/tts/README.md
+++ b/docs/source/tts/README.md
@ -35,7 +35,7 @@ In order to facilitate exploiting the existing TTS models directly and developin
  - [【Parallel WaveGAN】Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480)
  - [【WaveFlow】WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219)
 - Voice Cloning
-  - [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558v4.pdf)
+  - [Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis](https://arxiv.org/pdf/1806.04558v4.pdf)
  - [【GE2E】Generalized End-to-End Loss for Speaker Verification](https://arxiv.org/abs/1710.10467)

 ## Setup
--- a/docs/tutorial/tts/source/tts-timeline.png
+++ b/docs/tutorial/tts/source/tts-timeline.png
--- a/docs/tutorial/tts/source/wechat-group.png
+++ b/docs/tutorial/tts/source/wechat-group.png
--- a/docs/tutorial/tts/tts_tutorial.ipynb
+++ b/docs/tutorial/tts/tts_tutorial.ipynb
--- a/examples/aishell3/vc1/README.md
+++ b/examples/aishell3/vc1/README.md
@ -1,3 +1,4 @@
+
 # FastSpeech2 + AISHELL-3 Voice Cloning
 This example contains code used to train a [FastSpeech2](https://arxiv.org/abs/2006.04558) model with [AISHELL-3](http://www.aishelltech.com/aishell_3). The trained model can be used in Voice Cloning Task, We refer to the model structure of  [Transfer Learning from Speaker Veriﬁcation to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf) . The general steps are as follows:
 1. Speaker Encoder: We  use a Speaker Verification to train a speaker encoder. Datasets used in this task are different from those used in `FastSpeech2`, because the  transcriptions are not needed, we use more datasets, refer to  [ge2e](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/ge2e).
@ -121,6 +122,10 @@ CUDA_VISIBLE_DEVICES=${gpus} ./local/voice_cloning.sh ${conf_path} ${train_outpu
 ## Pretrained Model
 [fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_aishell3_vc1_ckpt_0.5.zip)

+Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss 
+:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
+default|2(gpu) x 96400|0.99699|0.62013|0.53057|0.11954| 0.20426|
+
 FastSpeech2 checkpoint contains files listed below.
 (There is no need for `speaker_id_map.txt` here )

--- a/examples/aishell3/voc1/README.md
+++ b/examples/aishell3/voc1/README.md
@ -138,6 +138,10 @@ optional arguments:
 ## Pretrained Models
 Pretrained models can be downloaded here [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip).

+Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss:| eval/spectral_convergence_loss 
+:-------------:| :------------:| :-----: | :-----: | :--------:
+default| 1(gpu) x 400000|1.968762|0.759008|0.218524
+
 Parallel WaveGAN checkpoint contains files listed below.

 ```text
--- a/examples/csmsc/tts2/README.md
+++ b/examples/csmsc/tts2/README.md
@ -216,6 +216,10 @@ Pretrained SpeedySpeech model with no silence in the edge of audios[speedyspeech

 Static model can be downloaded here [speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip).

+Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/ssim_loss
+:-------------:| :------------:| :-----: | :-----: | :--------:|:--------:
+default| 1(gpu) x 11400|0.83655|0.42324|0.03211| 0.38119
+
 SpeedySpeech checkpoint contains files listed below.
 ```text
 speedyspeech_nosil_baker_ckpt_0.5
--- a/examples/csmsc/tts3/README.md
+++ b/examples/csmsc/tts3/README.md
@ -207,6 +207,11 @@ Pretrained FastSpeech2 model with no silence in the edge of audios [fastspeech2_

 Static model can be downloaded here [fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_static_0.4.zip).

+Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss 
+:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
+default| 2(gpu) x 76000|1.0991|0.59132|0.035815| 0.31915| 0.15287|
+conformer| 2(gpu) x 76000||||||
+
 FastSpeech2 checkpoint contains files listed below.
 ```text
 fastspeech2_nosil_baker_ckpt_0.4
--- a/examples/csmsc/voc1/README.md
+++ b/examples/csmsc/voc1/README.md
@ -130,6 +130,10 @@ Pretrained model can be downloaded here [pwg_baker_ckpt_0.4.zip](https://paddles

 Static model can be downloaded here [pwg_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_baker_static_0.4.zip).

+Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss:| eval/spectral_convergence_loss 
+:-------------:| :------------:| :-----: | :-----: | :--------:
+default| 1(gpu) x 400000|1.948763|0.670098|0.248882
+
 Parallel WaveGAN checkpoint contains files listed below.

 ```text
--- a/examples/csmsc/voc3/README.md
+++ b/examples/csmsc/voc3/README.md
@ -157,6 +157,12 @@ Finetuned model can ben downloaded here [mb_melgan_baker_finetune_ckpt_0.5.zip](

 Static model can be downloaded here [mb_melgan_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/mb_melgan/mb_melgan_baker_static_0.5.zip)

+Model | Step | eval/generator_loss | eval/log_stft_magnitude_loss|eval/spectral_convergence_loss |eval/sub_log_stft_magnitude_loss|eval/sub_spectral_convergence_loss
+:-------------:| :------------:| :-----: | :-----: | :--------:| :--------:| :--------:
+default| 1(gpu) x 1000000| ——|—— |—— |—— | ——|
+finetune| 1(gpu) x 1000000|3.196967|0.977804| 0.778484| 0.889576 |0.776756 |
+
+
 Multi Band MelGAN checkpoint contains files listed below.

 ```text
--- a/examples/ljspeech/tts3/README.md
+++ b/examples/ljspeech/tts3/README.md
@ -197,6 +197,11 @@ optional arguments:
 ## Pretrained Model
 Pretrained FastSpeech2 model with no silence in the edge of audios. [fastspeech2_nosil_ljspeech_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_ljspeech_ckpt_0.5.zip)

+Model | Step | eval/loss | eval/l1_loss | eval/duration_loss | eval/pitch_loss| eval/energy_loss 
+:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:
+default| 2(gpu) x 100000| 1.505682|0.612104| 0.045505| 0.62792| 0.220147
+
+
 FastSpeech2 checkpoint contains files listed below.
 ```text
 fastspeech2_nosil_ljspeech_ckpt_0.5