update readme and add aistudio demo, test=doc (#2270)

pull/2284/head
Ming 2 years ago committed by GitHub
parent 112a0a40e1
commit c3865f2ab7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -180,62 +180,191 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
## Installation
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7* and *paddlepaddle>=2.3.1*.
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
### **Dependency Introduction**
+ gcc >= 4.8.5
+ paddlepaddle >= 2.3.1
+ python >= 3.7
+ OS support: Linux(recommend), Windows, Mac OSX
PaddleSpeech depends on paddlepaddle. For installation, please refer to the official website of [paddlepaddle](https://www.paddlepaddle.org.cn/en) and choose according to your own machine. Here is an example of the cpu version.
```bash
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
```
There are two quick installation methods for PaddleSpeech, one is pip installation, and the other is source code compilation (recommended).
### pip install
```shell
pip install pytest-runner
pip install paddlespeech
```
### source code compilation
```shell
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .
```
For more installation problems, such as conda environment, librosa-dependent, gcc problems, kaldi installation, etc., you can refer to this [installation document](./docs/source/install.md). If you encounter problems during installation, you can leave a message on [#2150](https://github.com/PaddlePaddle/PaddleSpeech/issues/2150) and find related problems
<a name="quickstart"></a>
## Quick Start
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text.
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md) or Python. Change `--input` to test your own audio/text and support 16k wav format audio.
**You can also quickly experience it in AI Studio 👉🏻 [PaddleSpeech API Demo](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660876445786)**
Test audio sample download
**Audio Classification**
```shell
paddlespeech cls --input input.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```
**Speaker Verification**
### Automatic Speech Recognition
<details><summary>&emsp;Click to expandOpen Source Speech Recognition</summary>
**command line experience**
```shell
paddlespeech asr --lang zh --input zh.wav
```
paddlespeech vector --task spk --input input_16k.wav
**Python API experience**
```python
>>> from paddlespeech.cli.asr.infer import ASRExecutor
>>> asr = ASRExecutor()
>>> result = asr(audio_file="zh.wav")
>>> print(result)
我认为跑步最重要的就是给我带来了身体健康
```
</details>
### Text-to-Speech
<details><summary>&emsp;Open Source Speech Synthesis</summary>
Output 24k sample rate wav format audio
**command line experience**
**Automatic Speech Recognition**
```shell
paddlespeech asr --lang zh --input input_16k.wav
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav
```
- web demo for Automatic Speech Recognition is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [ASR Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR)
**Speech Translation** (English to Chinese)
(not support for Mac and Windows now)
**Python API experience**
```python
>>> from paddlespeech.cli.tts.infer import TTSExecutor
>>> tts = TTSExecutor()
>>> tts(text="今天天气十分不错。", output="output.wav")
```
- You can experience in [Huggingface Spaces](https://huggingface.co/spaces) [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)
</details>
### Audio Classification
<details><summary>&emsp;An open-domain sound classification tool</summary>
Sound classification model based on 527 categories of AudioSet dataset
**command line experience**
```shell
paddlespeech st --input input_16k.wav
paddlespeech cls --input zh.wav
```
**Text-to-Speech**
**Python API experience**
```python
>>> from paddlespeech.cli.cls.infer import CLSExecutor
>>> cls = CLSExecutor()
>>> result = cls(audio_file="zh.wav")
>>> print(result)
Speech 0.9027186632156372
```
</details>
### Voiceprint Extraction
<details><summary>&emsp;Industrial-grade voiceprint extraction tool</summary>
**command line experience**
```shell
paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架!" --output output.wav
paddlespeech vector --task spk --input zh.wav
```
- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/KPatrick/PaddleSpeechTTS)
**Text Postprocessing**
- Punctuation Restoration
```bash
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
**Python API experience**
**Batch Process**
```python
>>> from paddlespeech.cli.vector import VectorExecutor
>>> vec = VectorExecutor()
>>> result = vec(audio_file="zh.wav")
>>> print(result) # 187维向量
[ -0.19083306 9.474295 -14.122263 -2.0916545 0.04848729
4.9295826 1.4780062 0.3733844 10.695862 3.2697146
-4.48199 -0.6617882 -9.170393 -11.1568775 -1.2358263 ...]
```
echo -e "1 欢迎光临。\n2 谢谢惠顾。" | paddlespeech tts
</details>
### Punctuation Restoration
<details><summary>&emsp;Quick recovery of text punctuation, works with ASR models</summary>
**command line experience**
```shell
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
```
**Shell Pipeline**
- ASR + Punctuation Restoration
**Python API experience**
```python
>>> from paddlespeech.cli.text.infer import TextExecutor
>>> text_punc = TextExecutor()
>>> result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。
```
paddlespeech asr --input ./zh.wav | paddlespeech text --task punc
</details>
### Speech Translation
<details><summary>&emsp;End-to-end English to Chinese Speech Translation Tool</summary>
Use pre-compiled kaldi related tools, only support experience in Ubuntu system
**command line experience**
```shell
paddlespeech st --input en.wav
```
For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
**Python API experience**
If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
```python
>>> from paddlespeech.cli.st.infer import STExecutor
>>> st = STExecutor()
>>> result = st(audio_file="en.wav")
['我 在 这栋 建筑 的 古老 门上 敲门 。']
```
</details>
<a name="quickstartserver"></a>
@ -243,6 +372,8 @@ If you want to try more functions like training and tuning, please have a look a
Developers can have a try of our speech server with [PaddleSpeech Server Command Line](./paddlespeech/server/README.md).
**You can try it quickly in AI Studio (recommend): [SpeechServer](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660877827034)**
**Start server**
```shell

@ -225,7 +225,7 @@ pip install .
安装完成后开发者可以通过命令行或者Python快速开始命令行模式下改变 `--input` 可以尝试用自己的音频或文本测试支持16k wav格式音频。
你也可以在`aistudio`中快速体验 👉🏻[PaddleSpeech API Demo ](https://aistudio.baidu.com/aistudio/projectdetail/4281335?shared=1)。
你也可以在`aistudio`中快速体验 👉🏻[一键预测快速上手Speech开发任务](https://aistudio.baidu.com/aistudio/projectdetail/4353348?sUid=2470186&shared=1&ts=1660878142250)。
测试音频示例下载
```shell
@ -373,7 +373,9 @@ python API 一键预测
<a name="快速使用服务"></a>
## 快速使用服务
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类三种服务。
安装完成后,开发者可以通过命令行一键启动语音识别,语音合成,音频分类等多种服务。
你可以在 AI Studio 中快速体验:[SpeechServer一键部署](https://aistudio.baidu.com/aistudio/projectdetail/4354592?sUid=2470186&shared=1&ts=1660878208266)
**启动服务**
```shell

Loading…
Cancel
Save