pull/545/head
Hui Zhang 4 years ago committed by GitHub
parent d7e753546a
commit 57ed5cd2e0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -4,7 +4,17 @@
*DeepSpeech on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inference & testing module, and demo deployment. *DeepSpeech on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inference & testing module, and demo deployment.
For more information, please docs under `doc`. For more information, please see below
[Install](docs/install.md)
[Getting Started](docs/geting_stared.md)
[Data Prepration](docs/data_preparation.md)
[Data Augmentation](docs/augmentation.md)
[Ngram LM](docs/ngram_lm.md)
[Server Demo](docs/server.md)
[Benchmark](docs/benchmark.md)
[Relased Model](docs/released_model.md)
[FAQ](docs/faq.md)
## Models ## Models
* [Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf) * [Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf)
@ -33,6 +43,7 @@ source tools/venv/bin/activate
Please see [Getting Started](docs/geting_started.md) and [tiny egs](examples/tiny/README.md). Please see [Getting Started](docs/geting_started.md) and [tiny egs](examples/tiny/README.md).
## Questions and Help ## Questions and Help
You are welcome to submit questions and bug reports in [Github Issues](https://github.com/PaddlePaddle/DeepSpeech/issues). You are also welcome to contribute to this project. You are welcome to submit questions and bug reports in [Github Issues](https://github.com/PaddlePaddle/DeepSpeech/issues). You are also welcome to contribute to this project.

@ -5,6 +5,16 @@
*DeepSpeech on PaddlePaddle*是一个采用[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)平台的端到端自动语音识别ASR引擎的开源项目 *DeepSpeech on PaddlePaddle*是一个采用[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)平台的端到端自动语音识别ASR引擎的开源项目
我们的愿景是为语音识别在工业应用和学术研究上,提供易于使用、高效和可扩展的工具,包括训练,推理,测试模块,以及 demo 部署。同时,我们还将发布一些预训练好的英语和普通话模型。 我们的愿景是为语音识别在工业应用和学术研究上,提供易于使用、高效和可扩展的工具,包括训练,推理,测试模块,以及 demo 部署。同时,我们还将发布一些预训练好的英语和普通话模型。
更多信息如下:
[安装](docs/install.md)
[开始](docs/geting_stared.md)
[数据处理](docs/data_preparation.md)
[数据增强](docs/augmentation.md)
[语言模型](docs/ngram_lm.md)
[服务部署](docs/server.md)
[Benchmark](docs/benchmark.md)
[Relased Model](docs/released_model.md)
[FAQ](docs/faq.md)
## 模型 ## 模型
* [Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf) * [Baidu's Deep Speech2](http://proceedings.mlr.press/v48/amodei16.pdf)

@ -4,7 +4,7 @@
We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of LibriSpeech samples whose audio durations are between 6.0 and 7.0 seconds). And it shows that a **near-linear** acceleration with multiple GPUs has been achieved. In the following figure, the time (in seconds) cost for training is printed on the blue bars. We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of LibriSpeech samples whose audio durations are between 6.0 and 7.0 seconds). And it shows that a **near-linear** acceleration with multiple GPUs has been achieved. In the following figure, the time (in seconds) cost for training is printed on the blue bars.
<img src="docs/images/multi_gpu_speedup.png" width=450><br/> <img src="images/multi_gpu_speedup.png" width=450><br/>
| # of GPU | Acceleration Rate | | # of GPU | Acceleration Rate |
| -------- | --------------: | | -------- | --------------: |

@ -0,0 +1,37 @@
# FAQ
1. 音频变速快慢到达什么晨读会影响识别率?
变速会提升识别效果一般用0.9 1.0 1.1 的变速。
2. 音量大小到什么程度会影响识别率?
一般训练会固定音量到一个范围内波动过大会影响训练估计在10dB ~ 20dB吧。
3. 语音模型训练数据的最小时长要求时多少?
Aishell-1大约178h的数据数据越多越好。
4. 那些噪声或背景生会影响识别率?
主要是人生干扰和低信噪比会影响识别率。
5. 单条语音数据的长度限制是多少?
一般训练的语音长度会限制在1s~6s之间和训练配置有关。
6. 背景声在识别前是否需要分离出来,或做降噪处理?
需要分离的,需要结合具体场景考虑。
7. 模型是否带有VAD人生激活识别能力
VAD是单独的模型或模块模型不包含此能力。
8. 是否支持长语音识别?
一般过VAD后识别。
9. Mandarin LM Large语言模型需要的硬件配置时怎样的
内存能放得下LM即可。

@ -71,7 +71,7 @@ CUDA_VISIBLE_DEVICES=0 bash local/tune.sh
The grid search will print the WER (word error rate) or CER (character error rate) at each point in the hyper-parameters space, and draw the error surface optionally. A proper hyper-parameters range should include the global minima of the error surface for WER/CER, as illustrated in the following figure. The grid search will print the WER (word error rate) or CER (character error rate) at each point in the hyper-parameters space, and draw the error surface optionally. A proper hyper-parameters range should include the global minima of the error surface for WER/CER, as illustrated in the following figure.
<p align="center"> <p align="center">
<img src="docs/images/tuning_error_surface.png" width=550> <img src="images/tuning_error_surface.png" width=550>
<br/>An example error surface for tuning on the dev-clean set of LibriSpeech <br/>An example error surface for tuning on the dev-clean set of LibriSpeech
</p> </p>

@ -0,0 +1,9 @@
# Aishell-1
## CTC
| Model | Config | Test set | CER |
| --- | --- | --- | --- |
| DeepSpeech2 | conf/deepspeech2.yaml | test | 0.078977 |
| DeepSpeech2 | release 1.8.5 | test | 0.080447 |

@ -0,0 +1,9 @@
# LibriSpeech
## CTC
| Model | Config | Test set | CER |
| --- | --- | --- | --- |
| DeepSpeech2 | conf/deepspeech2.yaml | test-clean | 0.073973 |
| DeepSpeech2 | release 1.8.5 | test-clean | 0.074939 |
Loading…
Cancel
Save