From 3ae013959b9e06f419fd98a74de54c90ec887c17 Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Mon, 16 May 2022 15:55:14 +0800 Subject: [PATCH 1/4] Updata PPASR_cn.md, test=doc --- docs/source/asr/PPASR_cn.md | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/docs/source/asr/PPASR_cn.md b/docs/source/asr/PPASR_cn.md index 1f72f1b9..6f04d104 100644 --- a/docs/source/asr/PPASR_cn.md +++ b/docs/source/asr/PPASR_cn.md @@ -1,3 +1,6 @@ +(简体中文|[English](./PPASR.md)) +# PP-ASR + ## 目录 - [1. 简介](#1) - [2. 特点](#2) @@ -12,7 +15,7 @@ ## 1. 简介 -PP-ASR 是一个 提供 ASR 功能的工具。其提供了多种中文和英文的模型,支持模型的训练,并且支持使用命令行的方式进行模型的推理。 PP-ASR也支持流式模型的部署,以及个性化场景的部署。 +PP-ASR 是一个 提供 ASR 功能的工具。其提供了多种中文和英文的模型,支持模型的训练,并且支持使用命令行的方式进行模型的推理。 PP-ASR 也支持流式模型的部署,以及个性化场景的部署。 ## 2. 特点 @@ -32,21 +35,23 @@ PP-ASR 的主要特点如下: ## 3.1 预训练模型 -支持的预训练模型列表:[released_model.md](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。 +支持的预训练模型列表:[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。 其中效果较好的模型为 Ds2 Online Wenetspeech ASR0 Model 以及 Conformer Online Wenetspeech ASR1 Model。 两个模型都支持流式 ASR。 - +关于模型设计的部分,可以参考 AIStudio 教程: +- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807) +- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110) ## 3.2 模型训练 模型的训练的参考脚本存放在 examples 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 aishell 和 librispeech,模型支持 deepspeech2 模型和 u2 (conformer/transformer) 模型。 -具体的执行脚本的步骤记录在 run.sh 当中。具体可参考[这里](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) +具体的执行脚本的步骤记录在 run.sh 当中。具体可参考: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) ## 3.3 模型推理 -PPASR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。 +PP-ASR 支持在使用`pip install paddlespeech`后 使用命令行的方式来使用预训练模型进行推理。 具体支持的功能包括: @@ -54,26 +59,37 @@ PPASR 支持在使用`pip install paddlespeech`后 使用命令行的方式来 - 使用管道的方式对多条音频进行预测 - 支持 RTF 的计算 -具体的使用方式可以参考[这里](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) +具体的使用方式可以参考: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) ## 3.4 服务部署 -PPASR 支持流式ASR的服务部署。支持 语音识别 + 标点处理两个功能同时使用。 +PP-ASR 支持流式ASR的服务部署。支持 语音识别 + 标点处理两个功能同时使用。 -server 的 demo [链接](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server) +server 的 demo: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server) ![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png) +网页上使用 asr server 的效果展示:[streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html) + +关于服务部署方面的更多资料,可以参考 AIStudio 教程: +- [流式服务-模型部分](https://aistudio.baidu.com/aistudio/projectdetail/3839884) +- [流式服务](https://aistudio.baidu.com/aistudio/projectdetail/4017905) + ## 3.5 支持个性化场景部署 -针对个性化场景部署,提供了 特征提取(fbank) => 推理模型(打分库)=> TLG(WFST, token, lexion, grammer)的 C++ 程序。具体参考[这里](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx) +针对个性化场景部署,提供了特征提取(fbank) => 推理模型(打分库)=> TLG(WFST, token, lexion, grammer)的 C++ 程序。具体参考 [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)。 如果想快速了解和使用,可以参考: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) + +关于支持个性化场景部署的更多资料,可以参考 AIStudio 教程: +- [定制化识别](https://aistudio.baidu.com/aistudio/projectdetail/4021561) + + ## 4. 快速开始 -关于如果使用 PPASR,可以看这里的[安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验paddlespeech 的推理功能,可以用 **简单** 安装方式。 +关于如果使用 PP-ASR,可以看这里的 [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md),其中提供了 **简单**、**中等**、**困难** 三种安装方式。如果想体验 paddlespeech 的推理功能,可以用 **简单** 安装方式。 From 4228de6f75f891318a691d049e82af6d3d2a752b Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Mon, 16 May 2022 18:13:07 +0800 Subject: [PATCH 2/4] test=asr --- docs/source/asr/PPASR.md | 96 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 docs/source/asr/PPASR.md diff --git a/docs/source/asr/PPASR.md b/docs/source/asr/PPASR.md new file mode 100644 index 00000000..ef22954a --- /dev/null +++ b/docs/source/asr/PPASR.md @@ -0,0 +1,96 @@ +([简体中文](./PPASR.md)|English) +# PP-ASR + +## Catalogue +- [1. Introduction](#1) +- [2. Characteristic](#2) +- [3. Tutorials](#3) + - [3.1 Pre-trained Models](#31) + - [3.2 Training](#32) + - [3.3 Inference](#33) + - [3.4 Service Deployment](#33) + - [3.5 Customized Auto Speech Recognition and Deployment](#33) +- [4. Quick Start](#4) + + +## 1. Introduction + +PP-ASR is a tool to provide ASR(Automatic speech recognition) function. It provides a variety of Chinese and English models and supports model training. It also supports model inference using the command line. In addition, PP-ASR supports the deployment of streaming models and customized ASR. + + +## 2. Characteristic +The basic process of ASR is shown in the figure below: +
+ + +The main characteristics of PP-ASR are shown below: +- Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models includes deepspeech2 and conformer/transformer. +- Support model training on Chinese/English datasets. +- Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use pre-trained model to do model inference. +- Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function. +- Support customized auto speech recognition and deployment. + + +## 3. Tutorials + + +## 3.1 Pre-trained Models +The support pre-trained model list: [released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md). +The model with good effect are Ds2 Online Wenetspeech ASR0 Model and Conformer Online Wenetspeech ASR1 Model. Both two models support streaming ASR. +For more information about model design, you can refer to the aistudio tutorial: +- [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807) +- [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110) + + +## 3.2 Training +The reference script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer). +The specific steps of executing the script are recorded in `run.sh`. + +For more information, you can refer to: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) + + + +## 3.3 Inference + +PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`. + +Specific supported functions include: + +- Prediction of single audio +- Use pipe to predict multiple audio +- Support RTF calculation + +For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) + + + +## 3.4 Service Deployment + +PP-ASR supports the service deployment of streaming ASR. Support the simultaneous use of speech recognition and punctuation processing. + +Demo of ASR Server: [streaming_asr_server](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server) + +![image](https://user-images.githubusercontent.com/87408988/168255342-1fc790c0-16f4-4540-a861-db239076727c.png) + +Display of using ASR server on Web page: [streaming_asr_demo_video](https://paddlespeech.readthedocs.io/en/latest/streaming_asr_demo_video.html) + + +For more information about service deployment, you can refer to the aistudio tutorial: +- [Streaming service - model part](https://aistudio.baidu.com/aistudio/projectdetail/3839884) +- [Streaming service](https://aistudio.baidu.com/aistudio/projectdetail/4017905) + + +## 3.5 Customized Auto Speech Recognition and Deployment + +For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model(Scoring Library)=> C++ program of TLG(WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx) +If you want to quickly use it, you can refer to: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) + +For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial: +- [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561) + + + + +## 4. Quick Start + +To use PP-ASR, you can see here [install](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md), It supplies three methods to install `paddlespeech`, which are **Easy**, **Medium** and **Hard**. If you want to experience the inference function of paddlespeech, you can use **Easy** installation method. From ff8b487f47f41aedc9a204c9f3a0613d6b88003d Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Mon, 16 May 2022 18:13:18 +0800 Subject: [PATCH 3/4] test=asr --- docs/source/asr/PPASR_cn.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/source/asr/PPASR_cn.md b/docs/source/asr/PPASR_cn.md index 6f04d104..82b1c1d3 100644 --- a/docs/source/asr/PPASR_cn.md +++ b/docs/source/asr/PPASR_cn.md @@ -26,7 +26,7 @@ PP-ASR 是一个 提供 ASR 功能的工具。其提供了多种中文和英文 PP-ASR 的主要特点如下: - 提供在中/英文开源数据集 aishell (中文),wenetspeech(中文),librispeech (英文)上的预训练模型。模型包含 deepspeech2 模型以及 conformer/transformer 模型。 - 支持中/英文的模型训练功能。 -- 支持命令行方式的模型推理, `paddlespeech asr --input xxx.wav` 方式调用各个预训练模型进行推理。 +- 支持命令行方式的模型推理,可使用 `paddlespeech asr --model xxx --input xxx.wav` 方式调用各个预训练模型进行推理。 - 支持流式 ASR 的服务部署,也支持输出时间戳。 - 支持个性化场景的部署。 @@ -37,15 +37,15 @@ PP-ASR 的主要特点如下: ## 3.1 预训练模型 支持的预训练模型列表:[released_model](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/released_model.md)。 其中效果较好的模型为 Ds2 Online Wenetspeech ASR0 Model 以及 Conformer Online Wenetspeech ASR1 Model。 两个模型都支持流式 ASR。 -关于模型设计的部分,可以参考 AIStudio 教程: +更多关于模型设计的部分,可以参考 AIStudio 教程: - [Deepspeech2](https://aistudio.baidu.com/aistudio/projectdetail/3866807) - [Transformer](https://aistudio.baidu.com/aistudio/projectdetail/3470110) ## 3.2 模型训练 -模型的训练的参考脚本存放在 examples 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 aishell 和 librispeech,模型支持 deepspeech2 模型和 u2 (conformer/transformer) 模型。 -具体的执行脚本的步骤记录在 run.sh 当中。具体可参考: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) +模型的训练的参考脚本存放在 [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) 中,并按照 `examples/数据集/模型` 存放,数据集主要支持 aishell 和 librispeech,模型支持 deepspeech2 模型和 u2 (conformer/transformer) 模型。 +具体的执行脚本的步骤记录在 `run.sh` 当中。具体可参考: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) @@ -80,7 +80,8 @@ server 的 demo: [streaming_asr_server](https://github.com/PaddlePaddle/Paddle ## 3.5 支持个性化场景部署 -针对个性化场景部署,提供了特征提取(fbank) => 推理模型(打分库)=> TLG(WFST, token, lexion, grammer)的 C++ 程序。具体参考 [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)。 如果想快速了解和使用,可以参考: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) +针对个性化场景部署,提供了特征提取(fbank) => 推理模型(打分库)=> TLG(WFST, token, lexion, grammer)的 C++ 程序。具体参考 [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx)。 +如果想快速了解和使用,可以参考: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) 关于支持个性化场景部署的更多资料,可以参考 AIStudio 教程: - [定制化识别](https://aistudio.baidu.com/aistudio/projectdetail/4021561) From bff52147dd1189f0c835093b9468b25c707e1704 Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Mon, 16 May 2022 19:20:53 +0800 Subject: [PATCH 4/4] test=doc --- docs/source/asr/PPASR.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/source/asr/PPASR.md b/docs/source/asr/PPASR.md index ef22954a..3779434e 100644 --- a/docs/source/asr/PPASR.md +++ b/docs/source/asr/PPASR.md @@ -1,4 +1,4 @@ -([简体中文](./PPASR.md)|English) +([简体中文](./PPASR_cn.md)|English) # PP-ASR ## Catalogue @@ -24,9 +24,9 @@ The basic process of ASR is shown in the figure below: The main characteristics of PP-ASR are shown below: -- Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models includes deepspeech2 and conformer/transformer. +- Provides pre-trained models on Chinese/English open source datasets: aishell(Chinese), wenetspeech(Chinese) and librispeech(English). The models include deepspeech2 and conformer/transformer. - Support model training on Chinese/English datasets. -- Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use pre-trained model to do model inference. +- Support model inference using the command line. You can use to use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference. - Support deployment of streaming ASR server. Besides ASR function, the server supports timestamp function. - Support customized auto speech recognition and deployment. @@ -43,21 +43,21 @@ For more information about model design, you can refer to the aistudio tutorial: ## 3.2 Training -The reference script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer). +The referenced script for model training is stored in [examples](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples) and stored according to "examples/dataset/model". The dataset mainly supports aishell and librispeech. The model supports deepspeech2 and u2(conformer/transformer). The specific steps of executing the script are recorded in `run.sh`. -For more information, you can refer to: [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) +For more information, you can refer to [asr1](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/aishell/asr1) ## 3.3 Inference -PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`. +PP-ASR supports use `paddlespeech asr --model xxx --input xxx.wav` to use the pre-trained model to do model inference after install `paddlespeech` by `pip install paddlespeech`. Specific supported functions include: - Prediction of single audio -- Use pipe to predict multiple audio +- Use the pipe to predict multiple audio - Support RTF calculation For specific usage, please refer to: [speech_recognition](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/speech_recognition/README_cn.md) @@ -83,7 +83,7 @@ For more information about service deployment, you can refer to the aistudio tut ## 3.5 Customized Auto Speech Recognition and Deployment For customized auto speech recognition and deployment, PP-ASR provides feature extraction(fbank) => Inference model(Scoring Library)=> C++ program of TLG(WFST, token, lexion, grammer). For specific usage, please refer to: [speechx](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/speechx) -If you want to quickly use it, you can refer to: [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) +If you want to quickly use it, you can refer to [custom_streaming_asr](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/custom_streaming_asr/README_cn.md) For more information about customized auto speech recognition and deployment, you can refer to the aistudio tutorial: - [Customized Auto Speech Recognition](https://aistudio.baidu.com/aistudio/projectdetail/4021561)