diff --git a/README.md b/README.md index 379550ce..506a7d3f 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,10 @@ ([简体中文](./README_cn.md)|English) + +

-
- -

- Quick Start - | Quick Start Server - | Documents - | Models List -

- ------------------------------------------------------------------------------------- -

@@ -28,6 +19,18 @@

+
+

+ Quick Start + | Quick Start Server + | Quick Start Streaming Server +
+ Documents + | Models List +

+
+ + **PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. @@ -142,26 +145,6 @@ For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech sample -### ⭐ Examples -- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.** - -
- -- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html) - -- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.** - -
- -
- -### 🔥 Hot Activities - -- 2021.12.21~12.24 - - 4 Days Live Courses: Depth interpretation of PaddleSpeech! - - **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130** ### Features @@ -174,11 +157,22 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision - 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details. - 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). -### Recent Update +### 🔥 Hot Activities + +- 2021.12.21~12.24 + + 4 Days Live Courses: Depth interpretation of PaddleSpeech! + + **Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130** + + +### Recent Update + +- 👏🏻 2022.04.28: PaddleSpeech Streaming Server is available for Automatic Speech Recognition and Text-to-Speech. - 👏🏻 2022.03.28: PaddleSpeech Server is available for Audio Classification, Automatic Speech Recognition and Text-to-Speech. - 👏🏻 2022.03.28: PaddleSpeech CLI is available for Speaker Verification. - 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! @@ -196,6 +190,7 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*. Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md). + ## Quick Start @@ -238,7 +233,7 @@ paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --ou **Batch Process** ``` echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts -``` +``` **Shell Pipeline** - ASR + Punctuation Restoration @@ -257,16 +252,19 @@ If you want to try more functions like training and tuning, please have a look a Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md). **Start server** + ```shell paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml ``` **Access Speech Recognition Services** + ```shell paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input input_16k.wav ``` **Access Text to Speech Services** + ```shell paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav ``` @@ -280,6 +278,37 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav For more information about server command lines, please see: [speech server demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) + +## Quick Start Streaming Server + +Developers can have a try of [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) server. + +**Start Streaming Speech Recognition Server** + +``` +paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml +``` + +**Access Streaming Speech Recognition Services** + +``` +paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav +``` + +**Start Streaming Text to Speech Server** + +``` +paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml +``` + +**Access Streaming Text to Speech Services** + +``` +paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav +``` + +For more information please see: [streaming asr](./demos/streaming_asr_server/README.md) and [streaming tts](./demos/streaming_tts_server/README.md) + ## Model List @@ -589,6 +618,21 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components. + +## ⭐ Examples +- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.** + +
+ +- [PaddleSpeech Demo Video](https://paddlespeech.readthedocs.io/en/latest/demo_video.html) + +- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): Use PaddleSpeech TTS and ASR to clone voice from videos.** + +
+ +
+ + ## Citation To cite PaddleSpeech for research, please use the following format. @@ -655,7 +699,6 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P ## Acknowledgement - - Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help. - Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files. - Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function. diff --git a/README_cn.md b/README_cn.md index 228d5d78..497863db 100644 --- a/README_cn.md +++ b/README_cn.md @@ -2,26 +2,45 @@

-
-

- 快速开始 - | 快速使用服务 - | 教程文档 - | 模型列表 -

-------------------------------------------------------------------------------------

- + + + +

+
+

+ Quick Start + | Quick Start Server + | Quick Start Streaming Server +
+ Documents + | Models List +

+
+ + +------------------------------------------------------------------------------------ + +
+

+ 快速开始 + | 快速使用服务 + | 快速使用流式服务 + | 教程文档 + | 模型列表 +

+ + + + **PaddleSpeech** 是基于飞桨 [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) 的语音方向的开源模型库,用于语音和音频中的各种关键任务的开发,包含大量基于深度学习前沿和有影响力的模型,一些典型的应用示例如下: ##### 语音识别 @@ -57,7 +78,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme 我认为跑步最重要的就是给我带来了身体健康。 - @@ -143,19 +163,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme -### ⭐ 应用案例 -- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。** - -
- -- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html) - - -- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。** - -
- -
### 🔥 热门活动 @@ -164,27 +171,32 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme 4 日直播课: 深度解读 PaddleSpeech 语音技术! **直播回放与课件资料: https://aistudio.baidu.com/aistudio/education/group/info/25130** -### 特性 -本项目采用了易用、高效、灵活以及可扩展的实现,旨在为工业应用、学术研究提供更好的支持,实现的功能包含训练、推断以及测试模块,以及部署过程,主要包括 -- 📦 **易用性**: 安装门槛低,可使用 [CLI](#quick-start) 快速开始。 -- 🏆 **对标 SoTA**: 提供了高速、轻量级模型,且借鉴了最前沿的技术。 -- 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换(G2P)。此外,我们使用自定义语言规则来适应中文语境。 -- **多种工业界以及学术界主流功能支持**: - - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。 - - 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块,并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC,详情请见 [模型列表](#model-list)。 - - 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。 ### 近期更新 +- 👏🏻 2022.04.28: PaddleSpeech Streaming Server 上线! 覆盖了语音识别和语音合成。 - 👏🏻 2022.03.28: PaddleSpeech Server 上线! 覆盖了声音分类、语音识别、以及语音合成。 - 👏🏻 2022.03.28: PaddleSpeech CLI 上线声纹验证。 - 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS) Demos on Hugging Face Spaces are available! - 👏🏻 2021.12.10: PaddleSpeech CLI 上线!覆盖了声音分类、语音识别、语音翻译(英译中)以及语音合成。 + +### 特性 + +本项目采用了易用、高效、灵活以及可扩展的实现,旨在为工业应用、学术研究提供更好的支持,实现的功能包含训练、推断以及测试模块,以及部署过程,主要包括 +- 📦 **易用性**: 安装门槛低,可使用 [CLI](#quick-start) 快速开始。 +- 🏆 **对标 SoTA**: 提供了高速、轻量级模型,且借鉴了最前沿的技术。 +- 💯 **基于规则的中文前端**: 我们的前端包含文本正则化和字音转换(G2P)。此外,我们使用自定义语言规则来适应中文语境。 +- **多种工业界以及学术界主流功能支持**: + - 🛎️ 典型音频任务: 本工具包提供了音频任务如音频分类、语音翻译、自动语音识别、文本转语音、语音合成等任务的实现。 + - 🔬 主流模型及数据集: 本工具包实现了参与整条语音任务流水线的各个模块,并且采用了主流数据集如 LibriSpeech、LJSpeech、AIShell、CSMSC,详情请见 [模型列表](#model-list)。 + - 🧩 级联模型应用: 作为传统语音任务的扩展,我们结合了自然语言处理、计算机视觉等任务,实现更接近实际需求的产业级应用。 + + ### 技术交流群 微信扫描二维码(好友申请通过后回复【语音】)加入官方交流群,获得更高效的问题答疑,与各行各业开发者充分交流,期待您的加入。 @@ -192,11 +204,13 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme + ## 安装 我们强烈建议用户在 **Linux** 环境下,*3.7* 以上版本的 *python* 上安装 PaddleSpeech。 目前为止,**Linux** 支持声音分类、语音识别、语音合成和语音翻译四种功能,**Mac OSX、 Windows** 下暂不支持语音翻译功能。 想了解具体安装细节,可以参考[安装文档](./docs/source/install_cn.md)。 + ## 快速开始 安装完成后,开发者可以通过命令行快速开始,改变 `--input` 可以尝试用自己的音频或文本测试。 @@ -232,7 +246,7 @@ paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架! **批处理** ``` echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts -``` +``` **Shell管道** ASR + Punc: @@ -269,6 +283,38 @@ paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input input.wav 更多服务相关的命令行使用信息,请参考 [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/speech_server) + +## 快速使用流式服务 + +开发者可以尝试[流式ASR](./demos/streaming_asr_server/README.md)和 [流式TTS](./demos/streaming_tts_server/README.md)服务. + +**启动流式ASR服务** + +``` +paddlespeech_server start --config_file ./demos/streaming_asr_server/conf/application.yaml +``` + +**访问流式ASR服务** + +``` +paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8090 --input input_16k.wav +``` + +**启动流式TTS服务** + +``` +paddlespeech_server start --config_file ./demos/streaming_tts_server/conf/tts_online_application.yaml +``` + +**访问流式TTS服务** + +``` +paddlespeech_client tts_online --server_ip 127.0.0.1 --port 8092 --protocol http --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav +``` + +更多信息参看: [流式 ASR](./demos/streaming_asr_server/README.md) 和 [流式 TTS](./demos/streaming_tts_server/README.md) + + ## 模型列表 PaddleSpeech 支持很多主流的模型,并提供了预训练模型,详情请见[模型列表](./docs/source/released_model.md)。 @@ -582,6 +628,21 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 语音合成模块最初被称为 [Parakeet](https://github.com/PaddlePaddle/Parakeet),现在与此仓库合并。如果您对该任务的学术研究感兴趣,请参阅 [TTS 研究概述](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview)。此外,[模型介绍](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) 是了解语音合成流程的一个很好的指南。 +## ⭐ 应用案例 +- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): 使用 PaddleSpeech 的语音合成模块生成虚拟人的声音。** + +
+ +- [PaddleSpeech 示例视频](https://paddlespeech.readthedocs.io/en/latest/demo_video.html) + + +- **[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk): 使用 PaddleSpeech 的语音合成和语音识别从视频中克隆人声。** + +
+ +
+ + ## 引用 要引用 PaddleSpeech 进行研究,请使用以下格式进行引用。 @@ -658,6 +719,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声 - 非常感谢 [jerryuhoo](https://github.com/jerryuhoo)/[VTuberTalk](https://github.com/jerryuhoo/VTuberTalk) 基于 PaddleSpeech 的 TTS GUI 界面和基于 ASR 制作数据集的相关代码。 + 此外,PaddleSpeech 依赖于许多开源存储库。有关更多信息,请参阅 [references](./docs/source/reference.md)。 ## License