Merge branch 'develop' of github.com:iftaken/PaddleSpeech into dev-readme

2 years ago · a2b5dfab8a
parent d7a7552795 b2baf1450a
commit a2b5dfab8a
9 changed files with 12 additions and 27 deletions
--- a/demos/README.md
+++ b/demos/README.md
@ -12,6 +12,7 @@ This directory contains many speech applications in multiple scenarios.
 * speech recognition - recognize text of an audio file 
 * speech server - Server for Speech Task, e.g. ASR,TTS,CLS
 * streaming asr server - receive audio stream from websocket, and recognize to transcript.
+* streaming tts server - receive text from http or websocket, and streaming audio data stream.
 * speech translation - end to end speech translation  
 * story talker - book reader based on OCR and TTS  
 * style_fs2 - multi style control for FastSpeech2 model  
--- a/demos/README_cn.md
+++ b/demos/README_cn.md
@ -10,8 +10,9 @@
 * 元宇宙 - 基于语音合成的 2D 增强现实。
 * 标点恢复 - 通常作为语音识别的文本后处理任务，为一段无标点的纯文本添加相应的标点符号。
 * 语音识别 - 识别一段音频中包含的语音文字。
-* 语音服务 - 离线语音服务，包括ASR、TTS、CLS等
-* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字
+* 语音服务 - 离线语音服务，包括ASR、TTS、CLS等。
+* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字。
+* 流式语音合成服务 - 根据待合成文本流式生成合成音频数据流。
 * 语音翻译 - 实时识别音频中的语言，并同时翻译成目标语言。
 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 
--- a/examples/aishell/asr1/README.md
+++ b/examples/aishell/asr1/README.md
@ -1,5 +1,5 @@
 # Transformer/Conformer ASR with Aishell
-This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Aishell dataset](http://www.openslr.org/resources/33)
+This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Aishell dataset](http://www.openslr.org/resources/33)
 ## Overview
 All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function.
 | Stage | Function                                                     |
--- a/examples/callcenter/README.md
+++ b/examples/callcenter/README.md
@ -1,20 +1,3 @@
 # Callcenter 8k sample rate

-Data distribution:
-
-```
-676048 utts
-491.4004722221223 h
-4357792.0 text
-2.4633630739178654 text/sec
-2.6167397877068495 sec/utt
-```
-
-train/dev/test partition:
-
-```
-    33802 manifest.dev
-    67606 manifest.test
-   574640 manifest.train
-   676048 total
-```
+This recipe only has model/data config for 8k ASR, user need to prepare data and generate manifest metafile. You can see Aishell or Libripseech.
--- a/examples/csmsc/vits/README.md
+++ b/examples/csmsc/vits/README.md
@ -154,7 +154,7 @@ VITS checkpoint contains files listed below.
 vits_csmsc_ckpt_1.1.0
 ├── default.yaml              # default config used to train vitx
 ├── phone_id_map.txt          # phone vocabulary file when training vits
-└── snapshot_iter_350000.pdz  # model parameters and optimizer states
+└── snapshot_iter_333000.pdz  # model parameters and optimizer states
 ```

 ps: This ckpt is not good enough, a better result is training
@ -169,7 +169,7 @@ FLAGS_allocator_strategy=naive_best_fit \
 FLAGS_fraction_of_gpu_memory_to_use=0.01 \
 python3 ${BIN_DIR}/synthesize_e2e.py \
    --config=vits_csmsc_ckpt_1.1.0/default.yaml \
-    --ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_350000.pdz \
+    --ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_333000.pdz \
    --phones_dict=vits_csmsc_ckpt_1.1.0/phone_id_map.txt \
    --output_dir=exp/default/test_e2e \
    --text=${BIN_DIR}/../sentences.txt \
--- a/examples/csmsc/vits/conf/default.yaml
+++ b/examples/csmsc/vits/conf/default.yaml
@ -179,7 +179,7 @@ generator_first: False # whether to start updating generator first
 #                OTHER TRAINING SETTING                  #
 ##########################################################
 num_snapshots: 10            # max number of snapshots to keep while training
-train_max_steps: 250000      # Number of training steps. == total_iters / ngpus, total_iters = 1000000
+train_max_steps: 350000      # Number of training steps. == total_iters / ngpus, total_iters = 1000000
 save_interval_steps: 1000    # Interval steps to save checkpoint.
 eval_interval_steps: 250     # Interval steps to evaluate the network.
 seed: 777                    # random seed number
--- a/examples/librispeech/asr1/README.md
+++ b/examples/librispeech/asr1/README.md
@ -1,5 +1,5 @@
 # Transformer/Conformer ASR with Librispeech
-This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12)
+This example contains code used to train [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12)
 ## Overview
 All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function.
 | Stage | Function                                                     |
--- a/examples/librispeech/asr2/README.md
+++ b/examples/librispeech/asr2/README.md
@ -1,6 +1,6 @@
 # Transformer/Conformer ASR with Librispeech ASR2

-This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi.
+This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi.

 To use this example, you need to install Kaldi first.

--- a/examples/tiny/asr1/README.md
+++ b/examples/tiny/asr1/README.md
@ -1,5 +1,5 @@
 # Transformer/Conformer ASR with Tiny
-This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model  Tiny dataset(a part of [[Librispeech dataset](http://www.openslr.org/resources/12)](http://www.openslr.org/resources/33))
+This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with Tiny dataset(a part of [[Librispeech dataset](http://www.openslr.org/resources/12)](http://www.openslr.org/resources/33))
 ## Overview
 All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function.
 | Stage | Function                                                     |