Merge pull request #808 from PaddlePaddle/doc

remove useless doc and put FAQ to discussions
pull/809/head
Jackwaterveg 3 years ago committed by GitHub
commit 48438066be
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -32,10 +32,8 @@ Please see [Getting Started](doc/src/getting_started.md) and [tiny egs](examples
* [Data Prepration](doc/src/data_preparation.md)
* [Data Augmentation](doc/src/augmentation.md)
* [Ngram LM](doc/src/ngram_lm.md)
* [Server Demo](doc/src/server.md)
* [Benchmark](doc/src/benchmark.md)
* [Relased Model](doc/src/released_model.md)
* [FAQ](doc/src/faq.md)
## Questions and Help

@ -33,10 +33,8 @@
* [数据处理](doc/src/data_preparation.md)
* [数据增强](doc/src/augmentation.md)
* [语言模型](doc/src/ngram_lm.md)
* [服务部署](doc/src/server.md)
* [Benchmark](doc/src/benchmark.md)
* [Relased Model](doc/src/released_model.md)
* [FAQ](doc/src/faq.md)
## 问题和帮助

@ -1,37 +0,0 @@
# FAQ
1. 音频变速快慢到达什么晨读会影响识别率?
变速会提升识别效果一般用0.9 1.0 1.1 的变速。
2. 音量大小到什么程度会影响识别率?
一般训练会固定音量到一个范围内波动过大会影响训练估计在10dB ~ 20dB吧。
3. 语音模型训练数据的最小时长要求时多少?
Aishell-1大约178h的数据数据越多越好。
4. 那些噪声或背景生会影响识别率?
主要是人生干扰和低信噪比会影响识别率。
5. 单条语音数据的长度限制是多少?
一般训练的语音长度会限制在1s~6s之间和训练配置有关。
6. 背景声在识别前是否需要分离出来,或做降噪处理?
需要分离的,需要结合具体场景考虑。
7. 模型是否带有VAD人生激活识别能力
VAD是单独的模型或模块模型不包含此能力。
8. 是否支持长语音识别?
一般过VAD后识别。
9. Mandarin LM Large语言模型需要的硬件配置时怎样的
内存能放得下LM即可。

@ -1,13 +1,20 @@
# Features
### Dataset
* Aishell
* Librispeech
* THCHS30
* TIMIT
### Speech Recognition
* Offline
* Non-Streaming
* [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf)
* [Transformer](https://arxiv.org/abs/1706.03762)
* [Conformer](https://arxiv.org/abs/2005.08100)
* Online
* Streaming
* [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf)
* [U2](https://arxiv.org/pdf/2012.05481.pdf)
### Language Model
@ -22,6 +29,15 @@
* beam search
* attention rescore
### Deployment
* Paddle Inference
### Aligment
* MFA
* CTC Aligment
### Speech Frontend
* Audio

@ -35,52 +35,3 @@ Different from the English language model, Mandarin language model is character-
* A whitespace character between two tokens is inserted.
Please notice that the released language models only contain Chinese simplified characters. After preprocessing done we can begin to train the language model. The key training arguments for small LM is '-o 5 --prune 0 1 2 4 4' and '-o 5' for large LM. Please refer above section for the meaning of each argument. We also convert the arpa file to binary file using default settings.
## [KenLM](http://kheafield.com/code/kenlm/)
统计语言模型工具有比较多的选择目前使用比较好的有srilm及kenlm其中kenlm比srilm晚出来训练速度也更快而且支持单机大数据的训练。现在介绍一下kenlm的使用方法。
1. 工具包的下载地址http://kheafield.com/code/kenlm.tar.gz
2. 使用。该工具在linux环境下使用方便。 先确保linux环境已经按照1.36.0的Boost和zlib
```
boost:
yum install boost
yum install boost-devel
zlib:
yum install zlib
yum install zlib-devel
```
然后gcc版本需要是4.8.2及以上。
```
wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
mkdir kenlm/build
cd kenlm/build
cmake ..
make -j2
```
3. 训练。使用如下命令进行训练:
```
build/bin/lmplz -o 3 --verbose_header --text people2014corpus_words.txt --arpa result/people2014corpus_words.arps
```
其中,
1people2014corpus_words.txt文件必须是分词以后的文件。
训练语料<人民日报2014版熟语料>,包括: 1标准人工切词及词性数据people2014.tar.gz 2未切词文本数据people2014_words.txt 3kenlm训练字粒度语言模型文件及其二进制文件people2014corpus_chars.arps/klm 4kenlm词粒度语言模型文件及其二进制文件people2014corpus_words.arps/klm。
2-o后面的5表示的是5-gram,一般取到3即可但可以结合自己实际情况判断。
4. 压缩。压缩模型为二进制,方便模型快速加载:
```
build/bin/build_binary ./result/people2014corpus_words.arps ./result/people2014corpus_words.klm
```

@ -1,34 +0,0 @@
# Trying Live Demo with Your Own Voice
Until now, an ASR model is trained and tested qualitatively (`infer`) and quantitatively (`test`) with existing audio files. But it is not yet tested with your own speech. We build up a real-time demo ASR engine with the trained model, enabling you to test and play around with the demo, with your own voice.
First, change your directory to `examples/aishell` and `source path.sh`.
To start the demo's server, please run this in one console:
```bash
CUDA_VISIBLE_DEVICES=0 bash local/server.sh
```
For the machine (might not be the same machine) to run the demo's client, please do the following installation before moving on.
For example, on MAC OS X:
```bash
brew install portaudio
pip install pyaudio
pip install keyboard
```
Then to start the client, please run this in another console:
```bash
CUDA_VISIBLE_DEVICES=0 bash local/client.sh
```
Now, in the client console, press the `whitespace` key, hold, and start speaking. Until finishing your utterance, release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key.
Notice that `deepspeech/exps/deepspeech2/deploy/client.py` must be run on a machine with a microphone device, while `deepspeech/exps/deepspeech2/deploy/server.py` could be run on one without any audio recording hardware, e.g. any remote server machine. Just be careful to set the `host_ip` and `host_port` argument with the actual accessible IP address and port, if the server and client are running with two separate machines. Nothing should be done if they are running on one single machine.
Please also refer to `examples/aishell/local/server.sh`, which will first download a pre-trained Chinese model (trained with AISHELL1) and then start the demo server with the model. With running `examples/aishell/local/client.sh`, you can speak Chinese to test it. If you would like to try some other models, just update `--checkpoint_path` argument in the script.  
Loading…
Cancel
Save