diff --git a/demos/README.md b/demos/README.md index 2a306df6..72b70b23 100644 --- a/demos/README.md +++ b/demos/README.md @@ -12,6 +12,7 @@ This directory contains many speech applications in multiple scenarios. * speech recognition - recognize text of an audio file * speech server - Server for Speech Task, e.g. ASR,TTS,CLS * streaming asr server - receive audio stream from websocket, and recognize to transcript. +* streaming tts server - receive text from http or websocket, and streaming audio data stream. * speech translation - end to end speech translation * story talker - book reader based on OCR and TTS * style_fs2 - multi style control for FastSpeech2 model diff --git a/demos/README_cn.md b/demos/README_cn.md index 47134212..04fc1fa7 100644 --- a/demos/README_cn.md +++ b/demos/README_cn.md @@ -10,8 +10,9 @@ * 元宇宙 - 基于语音合成的 2D 增强现实。 * 标点恢复 - 通常作为语音识别的文本后处理任务,为一段无标点的纯文本添加相应的标点符号。 * 语音识别 - 识别一段音频中包含的语音文字。 -* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等 -* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字 +* 语音服务 - 离线语音服务,包括ASR、TTS、CLS等。 +* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字。 +* 流式语音合成服务 - 根据待合成文本流式生成合成音频数据流。 * 语音翻译 - 实时识别音频中的语言,并同时翻译成目标语言。 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md index 25b28ede..a7390fd6 100644 --- a/examples/aishell/asr1/README.md +++ b/examples/aishell/asr1/README.md @@ -1,5 +1,5 @@ # Transformer/Conformer ASR with Aishell -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Aishell dataset](http://www.openslr.org/resources/33) +This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Aishell dataset](http://www.openslr.org/resources/33) ## Overview All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function. | Stage | Function | diff --git a/examples/callcenter/README.md b/examples/callcenter/README.md index 1c715cb6..6d521146 100644 --- a/examples/callcenter/README.md +++ b/examples/callcenter/README.md @@ -1,20 +1,3 @@ # Callcenter 8k sample rate -Data distribution: - -``` -676048 utts -491.4004722221223 h -4357792.0 text -2.4633630739178654 text/sec -2.6167397877068495 sec/utt -``` - -train/dev/test partition: - -``` - 33802 manifest.dev - 67606 manifest.test - 574640 manifest.train - 676048 total -``` +This recipe only has model/data config for 8k ASR, user need to prepare data and generate manifest metafile. You can see Aishell or Libripseech. diff --git a/examples/csmsc/vits/README.md b/examples/csmsc/vits/README.md index 5ca57e3a..8f223e07 100644 --- a/examples/csmsc/vits/README.md +++ b/examples/csmsc/vits/README.md @@ -154,7 +154,7 @@ VITS checkpoint contains files listed below. vits_csmsc_ckpt_1.1.0 ├── default.yaml # default config used to train vitx ├── phone_id_map.txt # phone vocabulary file when training vits -└── snapshot_iter_350000.pdz # model parameters and optimizer states +└── snapshot_iter_333000.pdz # model parameters and optimizer states ``` ps: This ckpt is not good enough, a better result is training @@ -169,7 +169,7 @@ FLAGS_allocator_strategy=naive_best_fit \ FLAGS_fraction_of_gpu_memory_to_use=0.01 \ python3 ${BIN_DIR}/synthesize_e2e.py \ --config=vits_csmsc_ckpt_1.1.0/default.yaml \ - --ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_350000.pdz \ + --ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_333000.pdz \ --phones_dict=vits_csmsc_ckpt_1.1.0/phone_id_map.txt \ --output_dir=exp/default/test_e2e \ --text=${BIN_DIR}/../sentences.txt \ diff --git a/examples/csmsc/vits/conf/default.yaml b/examples/csmsc/vits/conf/default.yaml index 32f995cc..a2aef998 100644 --- a/examples/csmsc/vits/conf/default.yaml +++ b/examples/csmsc/vits/conf/default.yaml @@ -179,7 +179,7 @@ generator_first: False # whether to start updating generator first # OTHER TRAINING SETTING # ########################################################## num_snapshots: 10 # max number of snapshots to keep while training -train_max_steps: 250000 # Number of training steps. == total_iters / ngpus, total_iters = 1000000 +train_max_steps: 350000 # Number of training steps. == total_iters / ngpus, total_iters = 1000000 save_interval_steps: 1000 # Interval steps to save checkpoint. eval_interval_steps: 250 # Interval steps to evaluate the network. seed: 777 # random seed number diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md index ae252a58..ca008144 100644 --- a/examples/librispeech/asr1/README.md +++ b/examples/librispeech/asr1/README.md @@ -1,5 +1,5 @@ # Transformer/Conformer ASR with Librispeech -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) +This example contains code used to train [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12) ## Overview All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function. | Stage | Function | diff --git a/examples/librispeech/asr2/README.md b/examples/librispeech/asr2/README.md index 5bc7185a..26978520 100644 --- a/examples/librispeech/asr2/README.md +++ b/examples/librispeech/asr2/README.md @@ -1,6 +1,6 @@ # Transformer/Conformer ASR with Librispeech ASR2 -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi. +This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with [Librispeech dataset](http://www.openslr.org/resources/12) and use some functions in kaldi. To use this example, you need to install Kaldi first. diff --git a/examples/tiny/asr1/README.md b/examples/tiny/asr1/README.md index 6a4999aa..cfa26670 100644 --- a/examples/tiny/asr1/README.md +++ b/examples/tiny/asr1/README.md @@ -1,5 +1,5 @@ # Transformer/Conformer ASR with Tiny -This example contains code used to train a Transformer or [Conformer](http://arxiv.org/abs/2008.03802) model Tiny dataset(a part of [[Librispeech dataset](http://www.openslr.org/resources/12)](http://www.openslr.org/resources/33)) +This example contains code used to train a [u2](https://arxiv.org/pdf/2012.05481.pdf) model (Transformer or [Conformer](https://arxiv.org/pdf/2005.08100.pdf) model) with Tiny dataset(a part of [[Librispeech dataset](http://www.openslr.org/resources/12)](http://www.openslr.org/resources/33)) ## Overview All the scripts you need are in `run.sh`. There are several stages in `run.sh`, and each stage has its function. | Stage | Function |