diff --git a/docs/source/asr/PPASR.md b/docs/source/asr/PPASR.md new file mode 100644 index 00000000..ef22954a --- /dev/null +++ b/docs/source/asr/PPASR.md @@ -0,0 +1,96 @@ +([简体中文](./PPASR.md)|English) +# PP-ASR + +## Catalogue +- [1. Introduction](#1) +- [2. Characteristic](#2) +- [3. Tutorials](#3) + - [3.1 Pre-trained Models](#31) + - [3.2 Training](#32) + - [3.3 Inference](#33) + - [3.4 Service Deployment](#33) + - [3.5 Customized Auto Speech Recognition and Deployment](#33) +- [4. Quick Start](#4) + + +## 1. Introduction + +PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR. + + +## 2. Characteristic +The basic process of ASR is shown in the figure below: +
+ + +The main characteristics of PP-ASR are shown below: +- Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models includes deepspeech2 and conformer/transformer. +- Support model training on Chinese/English datasets. +- Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use pre-trained model to do model inference. +- Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function. +- Support customized auto speech recognition and deployment. + + +## 3. Tutorials + + +## 3.1 Pre-trained Models +The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md). +The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR. +For more information about model design, you can refer to the aistudio tutorial: +- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807) +- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110) + + +## 3.2 Training +The reference script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer). +The specific steps of executing the script are recorded in `run.sh`. + +For more information, you can refer to: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) + + + +## 3.3 Inference + +PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`. + +Specific supported functions include: + +- Prediction of single audio +- Use pipe to predict multiple audio +- Support RTF calculation + +For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) + + + +## 3.4 Service Deployment + +PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing. + +Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server) + +![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png) + +Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html) + + +For more information about service deployment, you can refer to the aistudio tutorial: +- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884) +- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905) + + +## 3.5 Customized Auto Speech Recognition and Deployment + +For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model(Scoring Library)=> C++ program of TLG(WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx) +If you want to quickly use it, you can refer to: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) + +For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial: +- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561) + + + + +## 4. Quick Start + +To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.