From 57ed5cd2e0b1481c35e973ac6fd386208dcc8ad3 Mon Sep 17 00:00:00 2001 From: Hui Zhang Date: Wed, 10 Mar 2021 12:24:49 +0800 Subject: [PATCH] Fix Doc (#544) --- README.md | 15 ++++++++++++-- README_cn.md | 10 +++++++++ docs/benchmark.md | 2 +- docs/faq.md | 37 ++++++++++++++++++++++++++++++++++ docs/geting_started.md | 2 +- examples/aishell/README.md | 9 +++++++++ examples/librispeech/README.md | 9 +++++++++ 7 files changed, 80 insertions(+), 4 deletions(-) create mode 100644 docs/faq.md create mode 100644 examples/aishell/README.md create mode 100644 examples/librispeech/README.md diff --git a/README.md b/README.md index ed04d241..83d10100 100644 --- a/README.md +++ b/README.md @@ -4,13 +4,23 @@ *DeepSpeech on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inference & testing module, and demo deployment. -For more information, please docs under `doc`. +For more information, please see below: +[Install](docs/install.md) +[Getting Started](docs/geting_stared.md) +[Data Prepration](docs/data_preparation.md) +[Data Augmentation](docs/augmentation.md) +[Ngram LM](docs/ngram_lm.md) +[Server Demo](docs/server.md) +[Benchmark](docs/benchmark.md) +[Relased Model](docs/released_model.md) +[FAQ](docs/faq.md) + ## Models * [Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf) ## Setup -* python3.7 +* python 3.7 * paddlepaddle 2.0.0 - Run the setup script for the remaining dependencies @@ -33,6 +43,7 @@ source tools/venv/bin/activate Please see [Getting Started](docs/geting_started.md) and [tiny egs](examples/tiny/README.md). + ## Questions and Help You are welcome to submit questions and bug reports in [Github Issues](https://github.com/PaddlePaddle/DeepSpeech/issues). You are also welcome to contribute to this project. diff --git a/README_cn.md b/README_cn.md index d8dd0db6..ff9d3c07 100644 --- a/README_cn.md +++ b/README_cn.md @@ -5,6 +5,16 @@ *DeepSpeech on PaddlePaddle*是一个采用[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)平台的端到端自动语音识别(ASR)引擎的开源项目, 我们的愿景是为语音识别在工业应用和学术研究上,提供易于使用、高效和可扩展的工具,包括训练,推理,测试模块,以及 demo 部署。同时,我们还将发布一些预训练好的英语和普通话模型。 +更多信息如下: +[安装](docs/install.md) +[开始](docs/geting_stared.md) +[数据处理](docs/data_preparation.md) +[数据增强](docs/augmentation.md) +[语言模型](docs/ngram_lm.md) +[服务部署](docs/server.md) +[Benchmark](docs/benchmark.md) +[Relased Model](docs/released_model.md) +[FAQ](docs/faq.md) ## 模型 * [Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf) diff --git a/docs/benchmark.md b/docs/benchmark.md index 4ef3e680..3b5f8e95 100644 --- a/docs/benchmark.md +++ b/docs/benchmark.md @@ -4,7 +4,7 @@ We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of LibriSpeech samples whose audio durations are between 6.0 and 7.0 seconds). And it shows that a **near-linear** acceleration with multiple GPUs has been achieved. In the following figure, the time (in seconds) cost for training is printed on the blue bars. -
+
| # of GPU | Acceleration Rate | | -------- | --------------: | diff --git a/docs/faq.md b/docs/faq.md new file mode 100644 index 00000000..dc14058c --- /dev/null +++ b/docs/faq.md @@ -0,0 +1,37 @@ +# FAQ + +1. 音频变速快慢到达什么晨读会影响识别率? + +变速会提升识别效果,一般用0.9, 1.0, 1.1 的变速。 + +2. 音量大小到什么程度会影响识别率? + +一般训练会固定音量到一个范围内,波动过大会影响训练,估计在10dB ~ 20dB吧。 + +3. 语音模型训练数据的最小时长要求时多少? + +Aishell-1大约178h的数据,数据越多越好。 + +4. 那些噪声或背景生会影响识别率? + +主要是人生干扰和低信噪比会影响识别率。 + +5. 单条语音数据的长度限制是多少? + +一般训练的语音长度会限制在1s~6s之间,和训练配置有关。 + +6. 背景声在识别前是否需要分离出来,或做降噪处理? + +需要分离的,需要结合具体场景考虑。 + +7. 模型是否带有VAD人生激活识别能力? + +VAD是单独的模型或模块,模型不包含此能力。 + +8. 是否支持长语音识别? + +一般过VAD后识别。 + +9. Mandarin LM Large语言模型需要的硬件配置时怎样的? + +内存能放得下LM即可。 diff --git a/docs/geting_started.md b/docs/geting_started.md index fddb639a..478f3bb3 100644 --- a/docs/geting_started.md +++ b/docs/geting_started.md @@ -71,7 +71,7 @@ CUDA_VISIBLE_DEVICES=0 bash local/tune.sh The grid search will print the WER (word error rate) or CER (character error rate) at each point in the hyper-parameters space, and draw the error surface optionally. A proper hyper-parameters range should include the global minima of the error surface for WER/CER, as illustrated in the following figure.

- +
An example error surface for tuning on the dev-clean set of LibriSpeech

diff --git a/examples/aishell/README.md b/examples/aishell/README.md new file mode 100644 index 00000000..0413d4b2 --- /dev/null +++ b/examples/aishell/README.md @@ -0,0 +1,9 @@ +# Aishell-1 + +## CTC +| Model | Config | Test set | CER | +| --- | --- | --- | --- | +| DeepSpeech2 | conf/deepspeech2.yaml | test | 0.078977 | +| DeepSpeech2 | release 1.8.5 | test | 0.080447 | + + diff --git a/examples/librispeech/README.md b/examples/librispeech/README.md new file mode 100644 index 00000000..cb1ab003 --- /dev/null +++ b/examples/librispeech/README.md @@ -0,0 +1,9 @@ +# LibriSpeech + +## CTC +| Model | Config | Test set | CER | +| --- | --- | --- | --- | +| DeepSpeech2 | conf/deepspeech2.yaml | test-clean | 0.073973 | +| DeepSpeech2 | release 1.8.5 | test-clean | 0.074939 | + +