diff --git a/README.md b/README.md index 4e2a5685..0eeab770 100644 --- a/README.md +++ b/README.md @@ -32,10 +32,8 @@ Please see [Getting Started](doc/src/getting_started.md) and [tiny egs](examples * [Data Prepration](doc/src/data_preparation.md) * [Data Augmentation](doc/src/augmentation.md) * [Ngram LM](doc/src/ngram_lm.md) -* [Server Demo](doc/src/server.md) * [Benchmark](doc/src/benchmark.md) * [Relased Model](doc/src/released_model.md) -* [FAQ](doc/src/faq.md) ## Questions and Help diff --git a/README_cn.md b/README_cn.md index e3ad2009..3ff66895 100644 --- a/README_cn.md +++ b/README_cn.md @@ -33,10 +33,8 @@ * [数据处理](doc/src/data_preparation.md) * [数据增强](doc/src/augmentation.md) * [语言模型](doc/src/ngram_lm.md) -* [服务部署](doc/src/server.md) * [Benchmark](doc/src/benchmark.md) * [Relased Model](doc/src/released_model.md) -* [FAQ](doc/src/faq.md) ## 问题和帮助 diff --git a/doc/src/faq.md b/doc/src/faq.md deleted file mode 100644 index e2942817..00000000 --- a/doc/src/faq.md +++ /dev/null @@ -1,37 +0,0 @@ -# FAQ - -1. 音频变速快慢到达什么晨读会影响识别率? - - 变速会提升识别效果,一般用0.9, 1.0, 1.1 的变速。 - -2. 音量大小到什么程度会影响识别率? - - 一般训练会固定音量到一个范围内,波动过大会影响训练,估计在10dB ~ 20dB吧。 - -3. 语音模型训练数据的最小时长要求时多少? - - Aishell-1大约178h的数据,数据越多越好。 - -4. 那些噪声或背景生会影响识别率? - - 主要是人生干扰和低信噪比会影响识别率。 - -5. 单条语音数据的长度限制是多少? - - 一般训练的语音长度会限制在1s~6s之间,和训练配置有关。 - -6. 背景声在识别前是否需要分离出来,或做降噪处理? - - 需要分离的,需要结合具体场景考虑。 - -7. 模型是否带有VAD人生激活识别能力? - - VAD是单独的模型或模块,模型不包含此能力。 - -8. 是否支持长语音识别? - - 一般过VAD后识别。 - -9. Mandarin LM Large语言模型需要的硬件配置时怎样的? - - 内存能放得下LM即可。 diff --git a/doc/src/feature_list.md b/doc/src/feature_list.md index b675d810..4639ddd6 100644 --- a/doc/src/feature_list.md +++ b/doc/src/feature_list.md @@ -1,13 +1,20 @@ # Features +### Dataset +* Aishell +* Librispeech +* THCHS30 +* TIMIT + ### Speech Recognition -* Offline +* Non-Streaming * [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf) * [Transformer](https://arxiv.org/abs/1706.03762) * [Conformer](https://arxiv.org/abs/2005.08100) -* Online +* Streaming + * [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf) * [U2](https://arxiv.org/pdf/2012.05481.pdf) ### Language Model @@ -22,6 +29,15 @@ * beam search * attention rescore +### Deployment + +* Paddle Inference + +### Aligment + +* MFA +* CTC Aligment + ### Speech Frontend * Audio diff --git a/doc/src/ngram_lm.md b/doc/src/ngram_lm.md index 119a3b21..7872df22 100644 --- a/doc/src/ngram_lm.md +++ b/doc/src/ngram_lm.md @@ -35,52 +35,3 @@ Different from the English language model, Mandarin language model is character- * A whitespace character between two tokens is inserted. Please notice that the released language models only contain Chinese simplified characters. After preprocessing done we can begin to train the language model. The key training arguments for small LM is '-o 5 --prune 0 1 2 4 4' and '-o 5' for large LM. Please refer above section for the meaning of each argument. We also convert the arpa file to binary file using default settings. - - - -## [KenLM](http://kheafield.com/code/kenlm/) - -统计语言模型工具有比较多的选择,目前使用比较好的有srilm及kenlm,其中kenlm比srilm晚出来,训练速度也更快,而且支持单机大数据的训练。现在介绍一下kenlm的使用方法。 - -1. 工具包的下载地址:http://kheafield.com/code/kenlm.tar.gz - -2. 使用。该工具在linux环境下使用方便。 先确保linux环境已经按照1.36.0的Boost和zlib - - ``` - boost: - yum install boost - yum install boost-devel - - zlib: - yum install zlib - yum install zlib-devel - ``` - - 然后gcc版本需要是4.8.2及以上。 - - ``` - wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz - mkdir kenlm/build - cd kenlm/build - cmake .. - make -j2 - ``` - -3. 训练。使用如下命令进行训练: - - ``` - build/bin/lmplz -o 3 --verbose_header --text people2014corpus_words.txt --arpa result/people2014corpus_words.arps - ``` - - 其中, - 1)people2014corpus_words.txt文件必须是分词以后的文件。 - - 训练语料<人民日报2014版熟语料>,包括: 1)标准人工切词及词性数据people2014.tar.gz, 2)未切词文本数据people2014_words.txt, 3)kenlm训练字粒度语言模型文件及其二进制文件people2014corpus_chars.arps/klm, 4)kenlm词粒度语言模型文件及其二进制文件people2014corpus_words.arps/klm。 - - 2)-o后面的5表示的是5-gram,一般取到3即可,但可以结合自己实际情况判断。 - -4. 压缩。压缩模型为二进制,方便模型快速加载: - - ``` - build/bin/build_binary ./result/people2014corpus_words.arps ./result/people2014corpus_words.klm - ``` diff --git a/doc/src/server.md b/doc/src/server.md deleted file mode 100644 index 4918d5eb..00000000 --- a/doc/src/server.md +++ /dev/null @@ -1,34 +0,0 @@ - -# Trying Live Demo with Your Own Voice - -Until now, an ASR model is trained and tested qualitatively (`infer`) and quantitatively (`test`) with existing audio files. But it is not yet tested with your own speech. We build up a real-time demo ASR engine with the trained model, enabling you to test and play around with the demo, with your own voice. - -First, change your directory to `examples/aishell` and `source path.sh`. - -To start the demo's server, please run this in one console: - -```bash -CUDA_VISIBLE_DEVICES=0 bash local/server.sh -``` - -For the machine (might not be the same machine) to run the demo's client, please do the following installation before moving on. - -For example, on MAC OS X: - -```bash -brew install portaudio -pip install pyaudio -pip install keyboard -``` - -Then to start the client, please run this in another console: - -```bash -CUDA_VISIBLE_DEVICES=0 bash local/client.sh -``` - -Now, in the client console, press the `whitespace` key, hold, and start speaking. Until finishing your utterance, release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key. - -Notice that `deepspeech/exps/deepspeech2/deploy/client.py` must be run on a machine with a microphone device, while `deepspeech/exps/deepspeech2/deploy/server.py` could be run on one without any audio recording hardware, e.g. any remote server machine. Just be careful to set the `host_ip` and `host_port` argument with the actual accessible IP address and port, if the server and client are running with two separate machines. Nothing should be done if they are running on one single machine. - -Please also refer to `examples/aishell/local/server.sh`, which will first download a pre-trained Chinese model (trained with AISHELL1) and then start the demo server with the model. With running `examples/aishell/local/client.sh`, you can speak Chinese to test it. If you would like to try some other models, just update `--checkpoint_path` argument in the script.