Merge branch 'PaddlePaddle:develop' into database-search

pull/1916/head
qingen 3 years ago committed by GitHub
commit 892ea08a2e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -12,6 +12,8 @@ exclude =
.git, .git,
# python cache # python cache
__pycache__, __pycache__,
# third party
utils/compute-wer.py,
third_party/, third_party/,
# Provide a comma-separate list of glob patterns to include for checks. # Provide a comma-separate list of glob patterns to include for checks.
filename = filename =

@ -1,19 +1,10 @@
([简体中文](./README_cn.md)|English) ([简体中文](./README_cn.md)|English)
<p align="center"> <p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" /> <img src="./docs/images/PaddleSpeech_logo.png" />
</p> </p>
<div align="center">
<h3>
<a href="#quick-start"> Quick Start </a>
| <a href="#quick-start-server"> Quick Start Server </a>
| <a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a>
</div>
------------------------------------------------------------------------------------
<p align="center"> <p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
@ -28,6 +19,20 @@
<a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a> <a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a> <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
</p> </p>
<div align="center">
<h3>
| <a href="#quick-start"> Quick Start </a>
| <a href="#quick-start-server"> Quick Start Server </a>
| <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
|
</br>
| <a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a>
|
</h3>
</div>
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
@ -142,26 +147,6 @@ For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech sample
</div> </div>
### ⭐ Examples
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
<div align="center">
<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png" width = "500px" />
</div>
### 🔥 Hot Activities
- 2021.12.21~12.24
4 Days Live Courses: Depth interpretation of PaddleSpeech!
**Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### Features ### Features
@ -174,11 +159,22 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
- 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details. - 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
### Recent Update ### 🔥 Hot Activities
<!--- <!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live). 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
---> --->
- 2021.12.21~12.24
4 Days Live Courses: Depth interpretation of PaddleSpeech!
**Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### Recent Update
- 👏🏻 2022.04.28: PaddleSpeech Streaming Server is available for Automatic Speech Recognition and Text-to-Speech.
- 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech. - 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech.
- 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verification. - 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verification.
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
@ -196,6 +192,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*. We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md). Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
<a name="quickstart"></a> <a name="quickstart"></a>
## Quick Start ## Quick Start
@ -238,7 +235,7 @@ paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --ou
**Batch Process** **Batch Process**
``` ```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
``` ```
**Shell Pipeline** **Shell Pipeline**
- ASR + Punctuation Restoration - ASR + Punctuation Restoration
@ -257,16 +254,19 @@ If you want to try more functions like training and tuning, please have a look a
Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md). Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).
**Start server** **Start server**
```shell ```shell
paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
``` ```
**Access Speech Recognition Services** **Access Speech Recognition Services**
```shell ```shell
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
``` ```
**Access Text to Speech Services** **Access Text to Speech Services**
```shell ```shell
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
``` ```
@ -280,6 +280,37 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
<a name="quickstartstreamingserver"></a>
## Quick Start Streaming Server
Developers can have a try of [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) server.
**Start Streaming Speech Recognition Server**
```
paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
```
**Access Streaming Speech Recognition Services**
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
```
**Start Streaming Text to Speech Server**
```
paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
```
**Access Streaming Text to Speech Services**
```
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
For more information please see: [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md)
<a name="ModelList"></a> <a name="ModelList"></a>
## Model List ## Model List
@ -589,15 +620,31 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components. The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components.
## ⭐ Examples
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.**
<div align="center">
<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png" width = "500px" />
</div>
## Citation ## Citation
To cite PaddleSpeech for research, please use the following format. To cite PaddleSpeech for research, please use the following format.
```tex ```tex
@misc{ppspeech2021, @inproceedings{zhang2022paddlespeech,
title={PaddleSpeech, a toolkit for audio processing based on PaddlePaddle.}, title = {PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit},
author={PaddlePaddle Authors}, author = {Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, dianhai yu, Yanjun Ma, Liang Huang},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}}, booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations},
year={2021} year = {2022},
publisher = {Association for Computational Linguistics},
} }
@inproceedings{zheng2021fused, @inproceedings{zheng2021fused,
@ -654,7 +701,6 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
## Acknowledgement ## Acknowledgement
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help. - Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files. - Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function. - Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.

@ -2,26 +2,45 @@
<p align="center"> <p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" /> <img src="./docs/images/PaddleSpeech_logo.png" />
</p> </p>
<div align="center">
<h3>
<a href="#quick-start"> 快速开始 </a>
| <a href="#quick-start-server"> 快速使用服务 </a>
| <a href="#documents"> 教程文档 </a>
| <a href="#model-list"> 模型列表 </a>
</div>
------------------------------------------------------------------------------------
<p align="center"> <p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a> <a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-red.svg"></a>
<a href="support os"><img src="https://img.shields.io/badge/os-linux-yellow.svg"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/PaddleSpeech?color=ffa"></a>
<a href="support os"><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a> <a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/PaddleSpeech?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/PaddleSpeech?color=3af"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/PaddleSpeech?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a> <a href="https://github.com/PaddlePaddle/PaddleSpeech/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/PaddleSpeech?color=ccf"></a>
<a href="=https://pypi.org/project/paddlespeech/"><img src="https://img.shields.io/pypi/dm/PaddleSpeech"></a>
<a href="=https://pypi.org/project/paddlespeech/"><img src="https://static.pepy.tech/badge/paddlespeech"></a>
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a> <a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
</p> </p>
<div align="center">
<h3>
<a href="#quick-start"> Quick Start </a>
| <a href="#quick-start-server"> Quick Start Server </a>
| <a href="#quick-start-streaming-server"> Quick Start Streaming Server</a>
</br>
<a href="#documents"> Documents </a>
| <a href="#model-list"> Models List </a>
</h3>
</div>
------------------------------------------------------------------------------------
<div align="center">
<h3>
<a href="#quick-start"> 快速开始 </a>
| <a href="#quick-start-server"> 快速使用服务 </a>
| <a href="#quick-start-streaming-server"> 快速使用流式服务 </a>
| <a href="#documents"> 教程文档 </a>
| <a href="#model-list"> 模型列表 </a>
</div>
<!--- <!---
from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
@ -31,6 +50,8 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
4.What is the goal of this project? 4.What is the goal of this project?
--> -->
**PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下: **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下:
##### 语音识别 ##### 语音识别
@ -57,7 +78,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
</td> </td>
<td>我认为跑步最重要的就是给我带来了身体健康。</td> <td>我认为跑步最重要的就是给我带来了身体健康。</td>
</tr> </tr>
</tbody> </tbody>
</table> </table>
@ -143,19 +163,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
</div> </div>
### ⭐ 应用案例
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
<div align="center">
<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png" width = "500px" />
</div>
### 🔥 热门活动 ### 🔥 热门活动
@ -164,27 +171,32 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
4 日直播课: 深度解读 PaddleSpeech 语音技术! 4 日直播课: 深度解读 PaddleSpeech 语音技术!
**直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130** **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130**
### 特性
本项目采用了易用、高效、灵活以及可扩展的实现,旨在为工业应用、学术研究提供更好的支持,实现的功能包含训练、推断以及测试模块,以及部署过程,主要包括
- 📦 **易用性**: 安装门槛低,可使用 [CLI](#quick-start) 快速开始。
- 🏆 **对标 SoTA**: 提供了高速、轻量级模型,且借鉴了最前沿的技术。
- 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换G2P。此外我们使用自定义语言规则来适应中文语境。
- **多种工业界以及学术界主流功能支持**:
- 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。
- 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块,并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC详情请见 [模型列表](#model-list)。
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。
### 近期更新 ### 近期更新
<!--- <!---
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live). 2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
---> --->
- 👏🏻 2022.04.28: PaddleSpeech Streaming Server 上线! 覆盖了语音识别和语音合成。
- 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。 - 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。
- 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。 - 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available!
- 👏🏻 2021.12.10: PaddleSpeech CLI 上线!覆盖了声音分类、语音识别、语音翻译(英译中)以及语音合成。 - 👏🏻 2021.12.10: PaddleSpeech CLI 上线!覆盖了声音分类、语音识别、语音翻译(英译中)以及语音合成。
### 特性
本项目采用了易用、高效、灵活以及可扩展的实现,旨在为工业应用、学术研究提供更好的支持,实现的功能包含训练、推断以及测试模块,以及部署过程,主要包括
- 📦 **易用性**: 安装门槛低,可使用 [CLI](#quick-start) 快速开始。
- 🏆 **对标 SoTA**: 提供了高速、轻量级模型,且借鉴了最前沿的技术。
- 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换G2P。此外我们使用自定义语言规则来适应中文语境。
- **多种工业界以及学术界主流功能支持**:
- 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。
- 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块,并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC详情请见 [模型列表](#model-list)。
- 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。
### 技术交流群 ### 技术交流群
微信扫描二维码(好友申请通过后回复【语音】)加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。 微信扫描二维码(好友申请通过后回复【语音】)加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。
@ -192,11 +204,13 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg" width = "300" /> <img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg" width = "300" />
</div> </div>
## 安装 ## 安装
我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。 我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。
目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md)。 目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md)。
## 快速开始 ## 快速开始
安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试。 安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试。
@ -232,7 +246,7 @@ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!
**批处理** **批处理**
``` ```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
``` ```
**Shell管道** **Shell管道**
ASR + Punc: ASR + Punc:
@ -269,6 +283,38 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav
更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) 更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server)
<a name="quickstartstreamingserver"></a>
## 快速使用流式服务
开发者可以尝试[流式ASR](./demos/streaming_asr_server/README.md)和 [流式TTS](./demos/streaming_tts_server/README.md)服务.
**启动流式ASR服务**
```
paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml
```
**访问流式ASR服务**
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav
```
**启动流式TTS服务**
```
paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml
```
**访问流式TTS服务**
```
paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
更多信息参看: [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md)
<a name="modulelist"></a>
## 模型列表 ## 模型列表
PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)。 PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)。
@ -582,15 +628,31 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet),现在与此仓库合并。如果您对该任务的学术研究感兴趣,请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外,[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。 语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet),现在与此仓库合并。如果您对该任务的学术研究感兴趣,请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外,[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。
## ⭐ 应用案例
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。**
<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html)
- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。**
<div align="center">
<img src="https://raw.githubusercontent.com/jerryuhoo/VTuberTalk/main/gui/gui.png" width = "500px" />
</div>
## 引用 ## 引用
要引用 PaddleSpeech 进行研究,请使用以下格式进行引用。 要引用 PaddleSpeech 进行研究,请使用以下格式进行引用。
```text ```text
@misc{ppspeech2021, @inproceedings{zhang2022paddlespeech,
title={PaddleSpeech, a toolkit for audio processing based on PaddlePaddle.}, title = {PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit},
author={PaddlePaddle Authors}, author = {Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu, dianhai yu, Yanjun Ma, Liang Huang},
howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}}, booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations},
year={2021} year = {2022},
publisher = {Association for Computational Linguistics},
} }
@inproceedings{zheng2021fused, @inproceedings{zheng2021fused,
@ -657,6 +719,7 @@ year={2021}
- 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。 - 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。
此外PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。 此外PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。
## License ## License

Before

Width:  |  Height:  |  Size: 4.9 KiB

After

Width:  |  Height:  |  Size: 4.9 KiB

@ -13,6 +13,7 @@
# limitations under the License. # limitations under the License.
from .esc50 import ESC50 from .esc50 import ESC50
from .gtzan import GTZAN from .gtzan import GTZAN
from .hey_snips import HeySnips
from .rirs_noises import OpenRIRNoise from .rirs_noises import OpenRIRNoise
from .tess import TESS from .tess import TESS
from .urban_sound import UrbanSound8K from .urban_sound import UrbanSound8K

@ -17,6 +17,8 @@ import numpy as np
import paddle import paddle
from ..backends import load as load_audio from ..backends import load as load_audio
from ..compliance.kaldi import fbank as kaldi_fbank
from ..compliance.kaldi import mfcc as kaldi_mfcc
from ..compliance.librosa import melspectrogram from ..compliance.librosa import melspectrogram
from ..compliance.librosa import mfcc from ..compliance.librosa import mfcc
@ -24,6 +26,8 @@ feat_funcs = {
'raw': None, 'raw': None,
'melspectrogram': melspectrogram, 'melspectrogram': melspectrogram,
'mfcc': mfcc, 'mfcc': mfcc,
'kaldi_fbank': kaldi_fbank,
'kaldi_mfcc': kaldi_mfcc,
} }
@ -73,16 +77,24 @@ class AudioClassificationDataset(paddle.io.Dataset):
feat_func = feat_funcs[self.feat_type] feat_func = feat_funcs[self.feat_type]
record = {} record = {}
record['feat'] = feat_func( if self.feat_type in ['kaldi_fbank', 'kaldi_mfcc']:
waveform, sample_rate, waveform = paddle.to_tensor(waveform).unsqueeze(0) # (C, T)
**self.feat_config) if feat_func else waveform record['feat'] = feat_func(
waveform=waveform, sr=self.sample_rate, **self.feat_config)
else:
record['feat'] = feat_func(
waveform, sample_rate,
**self.feat_config) if feat_func else waveform
record['label'] = label record['label'] = label
return record return record
def __getitem__(self, idx): def __getitem__(self, idx):
record = self._convert_to_record(idx) record = self._convert_to_record(idx)
return np.array(record['feat']).transpose(), np.array( if self.feat_type in ['kaldi_fbank', 'kaldi_mfcc']:
record['label'], dtype=np.int64) return self.keys[idx], record['feat'], record['label']
else:
return np.array(record['feat']).transpose(), np.array(
record['label'], dtype=np.int64)
def __len__(self): def __len__(self):
return len(self.files) return len(self.files)

@ -0,0 +1,74 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import json
import os
from typing import List
from typing import Tuple
from .dataset import AudioClassificationDataset
__all__ = ['HeySnips']
class HeySnips(AudioClassificationDataset):
meta_info = collections.namedtuple('META_INFO',
('key', 'label', 'duration', 'wav'))
def __init__(self,
data_dir: os.PathLike,
mode: str='train',
feat_type: str='kaldi_fbank',
sample_rate: int=16000,
**kwargs):
self.data_dir = data_dir
files, labels = self._get_data(mode)
super(HeySnips, self).__init__(
files=files,
labels=labels,
feat_type=feat_type,
sample_rate=sample_rate,
**kwargs)
def _get_meta_info(self, mode) -> List[collections.namedtuple]:
ret = []
with open(os.path.join(self.data_dir, '{}.json'.format(mode)),
'r') as f:
data = json.load(f)
for item in data:
sample = collections.OrderedDict()
if item['duration'] > 0:
sample['key'] = item['id']
sample['label'] = 0 if item['is_hotword'] == 1 else -1
sample['duration'] = item['duration']
sample['wav'] = os.path.join(self.data_dir,
item['audio_file_path'])
ret.append(self.meta_info(*sample.values()))
return ret
def _get_data(self, mode: str) -> Tuple[List[str], List[int]]:
meta_info = self._get_meta_info(mode)
files = []
labels = []
self.keys = []
self.durations = []
for sample in meta_info:
key, target, duration, wav = sample
files.append(wav)
labels.append(int(target))
self.keys.append(key)
self.durations.append(float(duration))
return files, labels

@ -19,7 +19,7 @@ from setuptools.command.install import install
from setuptools.command.test import test from setuptools.command.test import test
# set the version here # set the version here
VERSION = '0.2.1' VERSION = '1.0.0a'
# Inspired by the example at https://pytest.org/latest/goodpractises.html # Inspired by the example at https://pytest.org/latest/goodpractises.html

@ -16,9 +16,9 @@ import os
import unittest import unittest
import numpy as np import numpy as np
import paddleaudio
import soundfile as sf import soundfile as sf
import paddleaudio
from ..base import BackendTest from ..base import BackendTest

@ -17,11 +17,10 @@ import urllib.request
import librosa import librosa
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio
import torch import torch
import torchaudio import torchaudio
import paddleaudio
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav' wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)): if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url)) urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))

@ -17,11 +17,10 @@ import urllib.request
import librosa import librosa
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio
import torch import torch
import torchaudio import torchaudio
import paddleaudio
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav' wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)): if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url)) urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))

@ -17,11 +17,10 @@ import urllib.request
import librosa import librosa
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio
import torch import torch
import torchaudio import torchaudio
import paddleaudio
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav' wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'
if not os.path.isfile(os.path.basename(wav_url)): if not os.path.isfile(os.path.basename(wav_url)):
urllib.request.urlretrieve(wav_url, os.path.basename(wav_url)) urllib.request.urlretrieve(wav_url, os.path.basename(wav_url))

@ -17,7 +17,6 @@ import urllib.request
import numpy as np import numpy as np
import paddle import paddle
from paddleaudio import load from paddleaudio import load
wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav' wav_url = 'https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav'

@ -15,9 +15,9 @@ import unittest
import numpy as np import numpy as np
import paddle import paddle
from paddleaudio.functional.window import get_window
from .base import FeatTest from .base import FeatTest
from paddleaudio.functional.window import get_window
from paddlespeech.s2t.transform.spectrogram import IStft from paddlespeech.s2t.transform.spectrogram import IStft
from paddlespeech.s2t.transform.spectrogram import Stft from paddlespeech.s2t.transform.spectrogram import Stft

@ -15,10 +15,10 @@ import unittest
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio
import torch import torch
import torchaudio import torchaudio
import paddleaudio
from .base import FeatTest from .base import FeatTest

@ -16,11 +16,11 @@ import unittest
import librosa import librosa
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio import paddleaudio
from .base import FeatTest
from paddleaudio.functional.window import get_window from paddleaudio.functional.window import get_window
from .base import FeatTest
class TestLibrosa(FeatTest): class TestLibrosa(FeatTest):
def initParmas(self): def initParmas(self):

@ -15,8 +15,8 @@ import unittest
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio import paddleaudio
from .base import FeatTest from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import LogMelSpectrogram from paddlespeech.s2t.transform.spectrogram import LogMelSpectrogram

@ -15,8 +15,8 @@ import unittest
import numpy as np import numpy as np
import paddle import paddle
import paddleaudio import paddleaudio
from .base import FeatTest from .base import FeatTest
from paddlespeech.s2t.transform.spectrogram import Spectrogram from paddlespeech.s2t.transform.spectrogram import Spectrogram

@ -15,9 +15,9 @@ import unittest
import numpy as np import numpy as np
import paddle import paddle
from paddleaudio.functional.window import get_window
from .base import FeatTest from .base import FeatTest
from paddleaudio.functional.window import get_window
from paddlespeech.s2t.transform.spectrogram import Stft from paddlespeech.s2t.transform.spectrogram import Stft

@ -10,6 +10,8 @@ The directory containes many speech applications in multi scenarios.
* metaverse - 2D AR with TTS * metaverse - 2D AR with TTS
* punctuation_restoration - restore punctuation from raw text * punctuation_restoration - restore punctuation from raw text
* speech recogintion - recognize text of an audio file * speech recogintion - recognize text of an audio file
* speech server - Server for Speech Task, e.g. ASR,TTS,CLS
* streaming asr server - receive audio stream from websocket, and recognize to transcript.
* speech translation - end to end speech translation * speech translation - end to end speech translation
* story talker - book reader based on OCR and TTS * story talker - book reader based on OCR and TTS
* style_fs2 - multi style control for FastSpeech2 model * style_fs2 - multi style control for FastSpeech2 model

@ -10,6 +10,8 @@
* 元宇宙 - 基于语音合成的 2D 增强现实。 * 元宇宙 - 基于语音合成的 2D 增强现实。
* 标点恢复 - 通常作为语音识别的文本后处理任务,为一段无标点的纯文本添加相应的标点符号。 * 标点恢复 - 通常作为语音识别的文本后处理任务,为一段无标点的纯文本添加相应的标点符号。
* 语音识别 - 识别一段音频中包含的语音文字。 * 语音识别 - 识别一段音频中包含的语音文字。
* 语音服务 - 离线语音服务包括ASR、TTS、CLS等
* 流式语音识别服务 - 流式输入语音数据流识别音频中的文字
* 语音翻译 - 实时识别音频中的语言,并同时翻译成目标语言。 * 语音翻译 - 实时识别音频中的语言,并同时翻译成目标语言。
* 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。 * 会说话的故事书 - 基于 OCR 和语音合成的会说话的故事书。
* 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。 * 个性化语音合成 - 基于 FastSpeech2 模型的个性化语音合成。

@ -19,6 +19,7 @@ from fastapi import FastAPI
from fastapi import File from fastapi import File
from fastapi import Form from fastapi import Form
from fastapi import UploadFile from fastapi import UploadFile
from logs import LOGGER
from mysql_helpers import MySQLHelper from mysql_helpers import MySQLHelper
from operations.count import do_count_vpr from operations.count import do_count_vpr
from operations.count import do_get from operations.count import do_get
@ -31,8 +32,6 @@ from starlette.middleware.cors import CORSMiddleware
from starlette.requests import Request from starlette.requests import Request
from starlette.responses import FileResponse from starlette.responses import FileResponse
from logs import LOGGER
app = FastAPI() app = FastAPI()
app.add_middleware( app.add_middleware(
CORSMiddleware, CORSMiddleware,

@ -1,5 +1,5 @@
([简体中文](./README_cn.md)|English) ([简体中文](./README_cn.md)|English)
# Speech Verification) # Speech Verification
## Introduction ## Introduction

@ -24,13 +24,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- Command Line(Recommended) - Command Line(Recommended)
```bash ```bash
# Chinese # Chinese
paddlespeech asr --input ./zh.wav paddlespeech asr --input ./zh.wav -v
# English # English
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# Chinese ASR + Punctuation Restoration # Chinese ASR + Punctuation Restoration
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
``` ```
(It doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.) (If you don't want to see the log information, you can remove "-v". Besides, it doesn't matter if package `paddlespeech-ctcdecoders` is not found, this package is optional.)
Usage: Usage:
```bash ```bash
@ -45,6 +45,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`. - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
- `yes`: No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate. Default: `False`. - `yes`: No additional parameters required. Once set this parameter, it means accepting the request of the program by default, which includes transforming the audio sample rate. Default: `False`.
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment. - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
- `verbose`: Show the log information.
Output: Output:
```bash ```bash
@ -84,8 +85,12 @@ Here is a list of pretrained models released by PaddleSpeech that can be used by
| Model | Language | Sample Rate | Model | Language | Sample Rate
| :--- | :---: | :---: | | :--- | :---: | :---: |
| conformer_wenetspeech| zh| 16k | conformer_wenetspeech | zh | 16k
| transformer_librispeech| en| 16k | conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k | deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k | deepspeech2online_aishell | zh | 16k
|deepspeech2offline_librispeech|en| 16k | deepspeech2offline_librispeech | en | 16k

@ -22,13 +22,13 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- 命令行 (推荐使用) - 命令行 (推荐使用)
```bash ```bash
# 中文 # 中文
paddlespeech asr --input ./zh.wav paddlespeech asr --input ./zh.wav -v
# 英文 # 英文
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav -v
# 中文 + 标点恢复 # 中文 + 标点恢复
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc paddlespeech asr --input ./zh.wav -v | paddlespeech text --task punc -v
``` ```
(如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error没有关系这个包是非必须的。) (如果不想显示 log 信息,可以不使用"-v", 另外如果显示 `paddlespeech-ctcdecoders` 这个 python 包没有找到的 Error没有关系这个包是非必须的。)
使用方法: 使用方法:
```bash ```bash
@ -43,6 +43,7 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
- `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。 - `ckpt_path`:模型参数文件,若不设置则下载预训练模型使用,默认值:`None`。
- `yes`;不需要设置额外的参数,一旦设置了该参数,说明你默认同意程序的所有请求,其中包括自动转换输入音频的采样率。默认值:`False`。 - `yes`;不需要设置额外的参数,一旦设置了该参数,说明你默认同意程序的所有请求,其中包括自动转换输入音频的采样率。默认值:`False`。
- `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。 - `device`:执行预测的设备,默认值:当前系统下 paddlepaddle 的默认 device。
- `verbose`: 如果使用,显示 logger 信息。
输出: 输出:
```bash ```bash
@ -82,7 +83,11 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
| 模型 | 语言 | 采样率 | 模型 | 语言 | 采样率
| :--- | :---: | :---: | | :--- | :---: | :---: |
| conformer_wenetspeech | zh | 16k | conformer_wenetspeech | zh | 16k
| conformer_online_multicn | zh | 16k
| conformer_aishell | zh | 16k
| conformer_online_aishell | zh | 16k
| transformer_librispeech | en | 16k | transformer_librispeech | en | 16k
| deepspeech2online_wenetspeech | zh | 16k
| deepspeech2offline_aishell| zh| 16k | deepspeech2offline_aishell| zh| 16k
| deepspeech2online_aishell | zh | 16k | deepspeech2online_aishell | zh | 16k
| deepspeech2offline_librispeech | en | 16k | deepspeech2offline_librispeech | en | 16k

@ -86,9 +86,6 @@ wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespee
``` ```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
# 流式ASR
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8091 --input ./zh.wav
``` ```
使用帮助: 使用帮助:

@ -0,0 +1,358 @@
([简体中文](./README_cn.md)|English)
# Speech Server
## Introduction
This demo is an implementation of starting the streaming speech service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
Streaming ASR server only support `websocket` protocol, and doesn't support `http` protocol.
## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
The configuration file can be found in `conf/ws_application.yaml``conf/ws_conformer_application.yaml`.
At present, the speech tasks integrated by the model include: DeepSpeech2 and conformer.
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for thisASR client demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
```
### 3. Server Usage
- Command Line (Recommended)
```bash
# in PaddleSpeech/demos/streaming_asr_server start the service
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
```
Usage:
```bash
paddlespeech_server start --help
```
Arguments:
- `config_file`: yaml file of the app, defalut: `./conf/application.yaml`
- `log_file`: log file. Default: `./log/paddlespeech.log`
Output:
```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
infos = await tasks.gather(*fs, loop=self)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
- Python API
```python
# in PaddleSpeech/demos/streaming_asr_server directory
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/ws_conformer_application.yaml",
log_file="./log/paddlespeech.log")
```
Output:
```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
infos = await tasks.gather(*fs, loop=self)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
### 4. ASR Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
Usage:
```bash
paddlespeech_client asr_online --help
```
Arguments:
- `server_ip`: server ip. Default: 127.0.0.1
- `port`: server port. Default: 8090
- `input`(required): Audio file to be recognized.
- `sample_rate`: Audio ampling rate, default: 16000.
- `lang`: Language. Default: "zh_cn".
- `audio_format`: Audio format. Default: "wav".
- `punc.server_ip`: punctuation server ip. Default: None.
- `punc.server_port`: punctuation server port. Default: None.
Output:
```bash
[2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
[2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
[2022-04-21 15:59:12,884] [ INFO] - Response time 9.051567 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
asrclient_executor = ASROnlineClientExecutor()
res = asrclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
print(res)
```
Output:
```bash
[2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
[2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
```

@ -0,0 +1,365 @@
([English](./README.md)|中文)
# 语音服务
## 介绍
这个demo是一个启动流式语音服务和访问服务的实现。 它可以通过使用`paddlespeech_server` 和 `paddlespeech_client`的单个命令或 python 的几行代码来实现。
**流式语音识别服务只支持 `weboscket` 协议,不支持 `http` 协议。**
## 使用方法
### 1. 安装
安装 PaddleSpeech 的详细过程请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md)。
推荐使用 **paddlepaddle 2.2.1** 或以上版本。
你可以从mediumhard 两种方式中选择一种方式安装 PaddleSpeech。
### 2. 准备配置文件
流式ASR的服务启动脚本和服务测试脚本存放在 `PaddleSpeech/demos/streaming_asr_server` 目录。
下载好 `PaddleSpeech` 之后,进入到 `PaddleSpeech/demos/streaming_asr_server` 目录。
配置文件可参见该目录下 `conf/ws_application.yaml``conf/ws_conformer_application.yaml`
目前服务集成的模型有: DeepSpeech2和 conformer模型对应的配置文件如下
* DeepSpeech: `conf/ws_application.yaml`
* conformer: `conf/ws_conformer_application.yaml`
这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
可以下载此 ASR client的示例音频
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
```
### 3. 服务端使用方法
- 命令行 (推荐使用)
```bash
# 在 PaddleSpeech/demos/streaming_asr_server 目录启动服务
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
```
使用方法:
```bash
paddlespeech_server start --help
```
参数:
- `config_file`: 服务的配置文件,默认: `./conf/application.yaml`
- `log_file`: log 文件. 默认:`./log/paddlespeech.log`
输出:
```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
infos = await tasks.gather(*fs, loop=self)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
- Python API
```python
# 在 PaddleSpeech/demos/streaming_asr_server 目录
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/ws_conformer_application.yaml",
log_file="./log/paddlespeech.log")
```
输出:
```bash
[2022-04-21 15:52:18,126] [ INFO] - create the online asr engine instance
[2022-04-21 15:52:18,127] [ INFO] - paddlespeech_server set the device: cpu
[2022-04-21 15:52:18,128] [ INFO] - Load the pretrained model, tag = conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,128] [ INFO] - File /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/asr1_chunk_conformer_multi_cn_ckpt_0.2.3.model.tar.gz md5 checking...
[2022-04-21 15:52:18,727] [ INFO] - Use pretrained model stored in: /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/model.yaml
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:18,727] [ INFO] - /home/users/xiongxinlei/.paddlespeech/models/conformer_online_multicn-zh-16k/exp/chunk_conformer/checkpoints/multi_cn.pdparams
[2022-04-21 15:52:19,446] [ INFO] - start to create the stream conformer asr engine
[2022-04-21 15:52:19,473] [ INFO] - model name: conformer_online
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
set kaiming_uniform
[2022-04-21 15:52:21,731] [ INFO] - create the transformer like model success
[2022-04-21 15:52:21,733] [ INFO] - Initialize ASR server engine successfully.
INFO: Started server process [11173]
[2022-04-21 15:52:21] [INFO] [server.py:75] Started server process [11173]
INFO: Waiting for application startup.
[2022-04-21 15:52:21] [INFO] [on.py:45] Waiting for application startup.
INFO: Application startup complete.
[2022-04-21 15:52:21] [INFO] [on.py:59] Application startup complete.
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1460: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
infos = await tasks.gather(*fs, loop=self)
/home/users/xiongxinlei/.conda/envs/paddlespeech/lib/python3.9/asyncio/base_events.py:1518: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
await tasks.sleep(0, loop=self)
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-04-21 15:52:21] [INFO] [server.py:206] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
### 4. ASR 客户端使用方法
**注意:** 初次使用客户端时响应时间会略长
- 命令行 (推荐使用)
```
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
使用帮助:
```bash
paddlespeech_client asr_online --help
```
参数:
- `server_ip`: 服务端ip地址默认: 127.0.0.1。
- `port`: 服务端口,默认: 8090。
- `input`(必须输入): 用于识别的音频文件。
- `sample_rate`: 音频采样率默认值16000。
- `lang`: 模型语言默认值zh_cn。
- `audio_format`: 音频格式默认值wav。
- `punc.server_ip` 标点预测服务的ip。默认是None。
- `punc.server_port` 标点预测服务的端口port。默认是None。
输出:
```bash
[2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
[2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,884] [ INFO] - 我认为跑步最重要的就是给我带来了身体健康
[2022-04-21 15:59:12,884] [ INFO] - Response time 9.051567 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ASROnlineClientExecutor
asrclient_executor = ASROnlineClientExecutor()
res = asrclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
print(res)
```
输出:
```bash
[2022-04-21 15:59:03,904] [ INFO] - receive msg={"status": "ok", "signal": "server_ready"}
[2022-04-21 15:59:03,960] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,973] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:03,987] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,000] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,012] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,024] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,036] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,047] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,607] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,620] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,633] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,645] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,657] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,669] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:04,680] [ INFO] - receive msg={'asr_results': ''}
[2022-04-21 15:59:05,176] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,185] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,192] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,200] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,208] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,216] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,224] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,232] [ INFO] - receive msg={'asr_results': '我认为跑'}
[2022-04-21 15:59:05,724] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,732] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,740] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,747] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,755] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,763] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:05,770] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的'}
[2022-04-21 15:59:06,271] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,279] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,287] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,294] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,302] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,310] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,318] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,326] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是'}
[2022-04-21 15:59:06,833] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,842] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,850] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,858] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,866] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,874] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:06,882] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给'}
[2022-04-21 15:59:07,400] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,408] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,416] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,424] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,432] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,440] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,447] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,455] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了'}
[2022-04-21 15:59:07,984] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:07,992] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,001] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,008] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,016] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:08,024] [ INFO] - receive msg={'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
[2022-04-21 15:59:12,883] [ INFO] - final receive msg={'status': 'ok', 'signal': 'finished', 'asr_results': '我认为跑步最重要的就是给我带来了身体健康'}
```

@ -0,0 +1,45 @@
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_multicn'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
device: # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2

@ -0,0 +1,47 @@
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'deepspeech2online_aishell'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
frame_duration_ms: 80
shift_ms: 40
sample_rate: 16000
sample_width: 2
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 20 # ms
shift_ms: 10 # ms

@ -0,0 +1,45 @@
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online']
# protocol = ['websocket'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_multicn'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
device: # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2

@ -0,0 +1,2 @@
# start the streaming asr service
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml

@ -0,0 +1,5 @@
# download the test wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
# read the wav and pass it to service
python3 websocket_client.py --wavfile ./zh.wav

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save