Update README.md

pull/940/head
Mingxue-Xu 4 years ago committed by GitHub
parent 2ebab248c3
commit 8ca14fe980
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -21,29 +21,43 @@ English | [简体中文](README_ch.md)
![python version](https://img.shields.io/badge/python-3.7+-orange.svg)
![support os](https://img.shields.io/badge/os-linux-yellow.svg)
> Notes: Is there any idea to add [Parakeet logo](https://github.com/PaddlePaddle/Parakeet/blob/develop/docs/images/logo.png) into this .md document?
<!---
why they should use your module,
how they can install it,
how they can use it
-->
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for two critical tasks in Speech - Automatic Speech Recognition (ASR) and Text-To-Speech Synthesis (TTS), with modules involving state-of-art and influential models.
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for two critical tasks in Speech - **Automatic Speech Recognition (ASR)** and **Text-To-Speech Synthesis (TTS)**, with modules involving state-of-art and influential models.
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing module, and deployment. Besides, this toolkit also features at:
- **Fast and Light-weight**: we provide a high-speed and ultra-lightweight model that is convenient for industrial deployment.
- **Rule-based Chinese frontend**: we utilize plenty of Chinese datasets and corpora to enhance user experience, including CSMSC and Baidu Internal Corpus.
- **Supporting of ASR streaming and non-streaming data**: This toolkit contains non-streaming models like [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf), [Transformer](https://arxiv.org/abs/1706.03762) and [Conformer](https://arxiv.org/abs/2005.08100). And for streaming models, we have [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf) and [U2](https://arxiv.org/pdf/2012.05481.pdf).
- **Varieties of mainstream models**: The toolkit integrates modules that participate in the whole pipeline of both ASR and TTS, [See also model lists](#models-list).
- **Varieties of Functions that Vitalize Research**:
- *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of both ASR and TTS, and uses datasets like LibriSpeech, LJSpeech, AIShell, etc. See also [model lists](#models-list) for more details.
- *Support of ASR streaming and non-streaming data*: This toolkit contains non-streaming/streaming models like [DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf), [Transformer](https://arxiv.org/abs/1706.03762), [Conformer](https://arxiv.org/abs/2005.08100) and [U2](https://arxiv.org/pdf/2012.05481.pdf).
> Notes: It is better to add a brief getting started.
Let's install PaddleSpeech with only a few lines of code!
>Note: The official name is still deepspeech. 2021/10/26
``` shell
# 1. Install essential libraries and paddlepaddle first.
# install prerequisites
sudo apt-get install -y sox pkg-config libflac-dev libogg-dev libvorbis-dev libboost-dev swig python3-dev libsndfile1
# `pip install paddlepaddle-gpu` instead if you are using GPU.
pip install paddlepaddle
# 2.Then install PaddleSpeech.
git clone https://github.com/PaddlePaddle/DeepSpeech.git
cd DeepSpeech
pip install -e .
```
## Table of Contents
The contents of this README is as follow:
- [Table of Contents](#table-of-contents)
- [Installation](#installation)
- [Alternative Installation](#installation)
- [Quick Start](#quick-start)
- [Models List](#models-list)
- [Tutorials](#tutorials)
@ -51,26 +65,38 @@ The contents of this README is as follow:
- [License](#license)
- [Acknowledgement](#acknowledgement)
## Installation
> Note: The installation guidance of TTS and ASR is now separated.
## Alternative Installation
Base environment:
* Ubuntu 16.04
* python>=3.7
* paddlepaddle==2.1.2
The base environment in this page is
- Ubuntu 16.04
- python>=3.7
- paddlepaddle==2.1.2
Please see the [ASR installation](docs/source/asr/install.md) and [TTS installation](docs/source/tts/install.md) documents for all the alternatives.
If you want to set up PaddleSpeech in other environment, please see the [ASR installation](docs/source/asr/install.md) and [TTS installation](docs/source/tts/install.md) documents for all the alternatives.
## Quick Start
> Note: It is better to use code blocks rather than hyperlinks.
> Note: Both ASR and TTS tiny examples are too long and duplicate, thus they are hard to summarized as a few lines of code.
Please see [ASR getting started](docs/source/asr/getting_started.md) ([tiny test](examples/tiny/s0/README.md)) and [TTS Basic Use](/docs/source/tts/basic_usage.md).
Try a tiny ASR DeepSpeech2 model training on toy set of LibriSpeech:
```shell
cd examples/tiny/s0/
# prepare, train, infer, evaluate and export model
bash local/data.sh
bash local/train.sh
bash local/infer.sh
bash local/test.sh
bash local/export.sh ckpt_path saved_jit_model_path
```
For more examples, please see [ASR getting started](docs/source/asr/getting_started.md) and [TTS Basic Use](/docs/source/tts/basic_usage.md).
## Models List
PaddleSpeech ASR supports a lot of mainstream models, which are summarized as follow. For more information, please refer to [ASRModels](./docs/source/asr/released_model.md).
> Note: ASR model list is aligned with [acoustic-model](https://github.com/PaddlePaddle/DeepSpeech/blob/develop/docs/source/asr/released_model.md#acoustic-model-released-in-paddle-2x).
PaddleSpeech ASR supports a lot of mainstream models, which are summarized as follow. For more information, please refer to [ASR Models](./docs/source/asr/released_model.md).
<!---
The current hyperlinks redirect to [Previous Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples).
@ -165,37 +191,37 @@ PaddleSpeech TTS mainly contains three modules: *Text Frontend*, *Acoustic Model
<td >Tacotron2</td>
<td rowspan="2" >LJSpeech</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/fastspeech2/vctk">tacotron2-vctk</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts0">tacotron2-vctk</a>
</td>
</tr>
<tr>
<td>TransformerTTS</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/transformer_tts/ljspeech">transformer-ljspeech</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/tts1">transformer-ljspeech</a>
</td>
</tr>
<tr>
<td>SpeedySpeech</td>
<td>CSMSC</td>
<td >
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/speedyspeech/baker">speedyspeech-csmsc</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts2">speedyspeech-csmsc</a>
</td>
</tr>
<tr>
<td rowspan="3">FastSpeech2</td>
<td>AISHELL-3</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/fastspeech2/aishell3">fastspeech2-aishell3</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/aishell3/tts3">fastspeech2-aishell3</a>
</td>
</tr>
<tr>
<td>VCTK</td>
<td> <a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/fastspeech2/vctk">fastspeech2-vctk</a> </td>
<td> <a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/vctk/tts3">fastspeech2-vctk</a> </td>
</tr>
<tr>
<td>CSMSC</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/fastspeech2/baker">fastspeech2-csmsc</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/tts3">fastspeech2-csmsc</a>
</td>
</tr>
<tr>
@ -203,20 +229,20 @@ PaddleSpeech TTS mainly contains three modules: *Text Frontend*, *Acoustic Model
<td >WaveFlow</td>
<td >LJSpeech</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/GANVocoder/parallelwave_gan/ljspeech">waveflow-ljspeech</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc0">waveflow-ljspeech</a>
</td>
</tr>
<tr>
<td rowspan="2">Parallel WaveGAN</td>
<td >LJSpeech</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/fastspeech2/baker">pwGAN-ljspeech</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/ljspeech/voc1">pwGAN-ljspeech</a>
</td>
</tr>
<tr>
<td >CSMSC</td>
<td>
<a href = "https://github.com/PaddlePaddle/Parakeet/tree/develop/examples/GANVocoder/parallelwave_gan/baker">pwGAN-csmsc</a>
<a href = "https://github.com/PaddlePaddle/DeepSpeech/tree/develop/examples/csmsc/voc1">pwGAN-csmsc</a>
</td>
</tr>
</tbody>
@ -225,7 +251,9 @@ PaddleSpeech TTS mainly contains three modules: *Text Frontend*, *Acoustic Model
## Tutorials
More background information for ASR, please refer to:
Normally, [Speech SoTA](https://paperswithcode.com/area/speech) gives you an overview about the hot academic topics in speech. If you want to focus on the two tasks in PaddleSpeech, you will find the following guidelines are helpful to grasp the core ideas.
The original ASR module is based on [Baidu's DeepSpeech](https://arxiv.org/abs/1412.5567) which is an independent product named [DeepSpeech](https://deepspeech.readthedocs.io). However, the toolkit aligns almost all the SoTA modules in the pipeline. Specifically, these modules are
* [Data Prepration](docs/source/asr/data_preparation.md)
* [Data Augmentation](docs/source/asr/augmentation.md)
@ -233,14 +261,12 @@ More background information for ASR, please refer to:
* [Benchmark](docs/source/asr/benchmark.md)
* [Relased Model](docs/source/asr/released_model.md)
For TTS, [this document](https://paddleparakeet.readthedocs.io/en/latest/) is a good guideline.
The TTS module is originall called [Parakeet](./docs/source/tts/README.md), and now merged with DeepSpeech. If you are interested in the academic research about this function, please see [TTS research overview](https://github.com/PaddlePaddle/DeepSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://paddleparakeet.readthedocs.io/en/latest/released_models.html) is a good guideline for the pipeline components.
## FAQ and Contributing
You are warmly welcome to submit questions in [Discussions](https://github.com/PaddlePaddle/DeepSpeech/discussions) and bug reports in [Issues](https://github.com/PaddlePaddle/DeepSpeech/issues)!
Also, we highly appreciate if you would like to contribute to this project!
You are warmly welcome to submit questions in [discussions](https://github.com/PaddlePaddle/DeepSpeech/discussions) and bug reports in [issues](https://github.com/PaddlePaddle/DeepSpeech/issues)! Also, we highly appreciate if you would like to contribute to this project!
## License
@ -248,8 +274,5 @@ DeepSpeech is provided under the [Apache-2.0 License](./LICENSE).
## Acknowledgement
DeepSpeech depends on many open source repos. See [References](docs/source/asr/reference.md) for more information.
<code> **Updates on 2021/10/21**: This [README.md](README.md) outline is not completed, especially *from section **Quick Start***.</code>
PaddleSpeech depends on a lot of open source repos. See [references](docs/source/asr/reference.md) for more information.

Loading…
Cancel
Save