PaddleSpeech/docs/source/asr/PPASR.md

([简体中文](./PPASR_cn.md)|English)
# PP-ASR

## Catalogue
- [1. Introduction](#1)
- [2. Characteristic](#2)
- [3. Tutorials](#3)
    - [3.1 Pre-trained Models](#31)
    - [3.2 Training](#32)
    - [3.3 Inference](#33)
    - [3.4 Service Deployment](#33)
    - [3.5 Customized Auto Speech Recognition and Deployment](#33)
- [4. Quick Start](#4)

<a name="1"></a>
## 1. Introduction

PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR.

<a name="2"></a>
## 2. Characteristic
The basic process of ASR is shown in the figure below:  
<center><img src=https://user-images.githubusercontent.com/87408988/168259962-cbe2008b-47b6-443d-9566-d77a5ca2eb25.png width="800" ></center>


The main characteristics of PP-ASR are shown below:
-  Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models include deepspeech2 and conformer/transformer.
-  Support model training on Chinese/English datasets.
-  Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference. 
-  Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function.
-  Support customized auto speech recognition and deployment.

<a name="3"></a>
## 3. Tutorials

<a name="31"></a>
## 3.1 Pre-trained Models
The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md).  
The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR.  
For more information about model design, you can refer to the aistudio tutorial:
- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807)
- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110)

<a name="32"></a>
## 3.2 Training
The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer).
The specific steps of executing the script are recorded in `run.sh`.

For more information, you can refer to [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1)


<a name="33"></a>
## 3.3 Inference

PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`.

Specific supported functions include:

- Prediction of single audio
- Use the pipe to predict multiple audio
- Support RTF calculation

For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) 


<a name="34"></a>
## 3.4 Service Deployment

PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing.

Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server)

![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png)

Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html)


For more information about service deployment, you can refer to the aistudio tutorial:
- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884)
- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905)

<a name="35"></a>
## 3.5 Customized Auto Speech Recognition and Deployment

For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model（Scoring Library）=> C++ program of TLG（WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)   
If you want to quickly use it, you can refer to [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md)

For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial:
- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561)


<a name="4"></a>

## 4. Quick Start

To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.
-												test=doc
											
										
										
											3 years ago
+								([简体中文](./PPASR_cn.md)|English)
-												test=asr
											
										
										
											3 years ago
+								# PP-ASR
 								## Catalogue
 								- [1. Introduction](#1)
 								- [2. Characteristic](#2)
 								- [3. Tutorials](#3)
 								    - [3.1 Pre-trained Models](#31)
 								    - [3.2 Training](#32)
 								    - [3.3 Inference](#33)
 								    - [3.4 Service Deployment](#33)
 								    - [3.5 Customized Auto Speech Recognition and Deployment](#33)
 								- [4. Quick Start](#4)
 								<a name="1"></a>
 								## 1. Introduction
 								PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR.
 								<a name="2"></a>
 								## 2. Characteristic
 								The basic process of ASR is shown in the figure below:
 								<center><img src=https://user-images.githubusercontent.com/87408988/168259962-cbe2008b-47b6-443d-9566-d77a5ca2eb25.png width="800" ></center>
 								The main characteristics of PP-ASR are shown below:
-												test=doc
											
										
										
											3 years ago
+								-  Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models include deepspeech2 and conformer/transformer.
-												test=asr
											
										
										
											3 years ago
+								-  Support model training on Chinese/English datasets.
-												test=doc
											
										
										
											3 years ago
+								-  Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference.
-												test=asr
											
										
										
											3 years ago
+								-  Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function.
 								-  Support customized auto speech recognition and deployment.
 								<a name="3"></a>
 								## 3. Tutorials
 								<a name="31"></a>
 								## 3.1 Pre-trained Models
 								The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md).
 								The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR.
 								For more information about model design, you can refer to the aistudio tutorial:
 								- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807)
 								- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110)
 								<a name="32"></a>
 								## 3.2 Training
-												test=doc
											
										
										
											3 years ago
+								The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer).
-												test=asr
											
										
										
											3 years ago
+								The specific steps of executing the script are recorded in `run.sh`.
-												test=doc
											
										
										
											3 years ago
+								For more information, you can refer to [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1)
-												test=asr
											
										
										
											3 years ago
 								<a name="33"></a>
 								## 3.3 Inference
-												test=doc
											
										
										
											3 years ago
+								PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`.
-												test=asr
											
										
										
											3 years ago
 								Specific supported functions include:
 								- Prediction of single audio
-												test=doc
											
										
										
											3 years ago
+								- Use the pipe to predict multiple audio
-												test=asr
											
										
										
											3 years ago
+								- Support RTF calculation
 								For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md)
 								<a name="34"></a>
 								## 3.4 Service Deployment
 								PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing.
 								Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server)
 								![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png)
 								Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html)
 								For more information about service deployment, you can refer to the aistudio tutorial:
 								- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884)
 								- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905)
 								<a name="35"></a>
 								## 3.5 Customized Auto Speech Recognition and Deployment
 								For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model（Scoring Library）=> C++ program of TLG（WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)
-												test=doc
											
										
										
											3 years ago
+								If you want to quickly use it, you can refer to [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md)
-												test=asr
											
										
										
											3 years ago
 								For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial:
 								- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561)
 								<a name="4"></a>
 								## 4. Quick Start
 								To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method.