diff --git a/README.md b/README.md index 809ffe6df..fe797dd2a 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,81 @@ -# PaddlePaddle Speech toolkit +# PaddleSpeech ![License](https://img.shields.io/badge/license-Apache%202-red.svg) ![python version](https://img.shields.io/badge/python-3.7+-orange.svg) ![support os](https://img.shields.io/badge/os-linux-yellow.svg) -*DeepSpeech* is an open-source implementation of end-to-end Automatic Speech Recognition engine, with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient, samller and scalable implementation, including training, inference & testing module, and deployment. - + + + + +**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for two critical tasks in Speech - Automatic Speech Recognition (ASR) and Text-To-Speech Synthesis (TTS), with modules involving state-of-art and influential models. + +## Table of Contents +- [Table of Contents](#table-of-contents) +- [Features](#features) +- [Installation](#installation) +- [Getting Started](#getting-started) +- [Guidelines of DeepSpeech Pipeline](#guidelines-of-deepspeech-pipeline) +- [FAQ and Contributing](#faq-and-contributing) +- [Acknowledgement](#acknowledgement) +- [License](#license) ## Features - See [feature list](docs/source/asr/feature_list.md) for more information. +Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing module, and deployment. + + + +The features of **ASR** are summarized as follows: +- **Used datasets** + - Aishell, THCHS30, TIMIT and Librispeech +- **Model support of streaming and non-streaming data** + - Non-streaming: [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf), [Transformer](https://arxiv.org/abs/1706.03762) and [Conformer](https://arxiv.org/abs/2005.08100) + - Streaming: [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf) and [U2](https://arxiv.org/pdf/2012.05481.pdf) +- **Language Model**: Ngram +- **Decoder**: ctc greedy, ctc prefix beam search, greedy, beam search, attention rescore +- **Aligment**: MFA, CTC Aligment +- **Speech Frontend** + - Audio: Auto Gain + - Feature: kaldi fbank, kaldi mfcc, linear, delta detla +- **Speech Augmentation** + - Audio: Auto Gain + - Feature: Volume Perturbation, Speed Perturbation, Shifting Perturbation, Online Bayesian normalization, Noise Perturbation, Impulse Response,Spectrum, SpecAugment, Adaptive SpecAugment +- **Tokenizer**: Chinese/English Character, English Word, Sentence Piece + +- **Word Segmentation**: [mmseg](http://technology.chtsai.org/mmseg/) -## Setup +The features of **TTS** are summarized as follows: + +- **Blabla** + - Blabla ... + +## Installation All tested under: * Ubuntu 16.04 * python>=3.7 * paddlepaddle==2.1.2 -Please see [install](docs/source/asr/install.md). +Please see the [installation](docs/source/asr/install.md) doc for all the alternatives. ## Getting Started Please see [Getting Started](docs/source/asr/getting_started.md) and [tiny egs](examples/tiny/s0/README.md). -## More Information +## Guidelines of Pipeline * [Data Prepration](docs/source/asr/data_preparation.md) * [Data Augmentation](docs/source/asr/augmentation.md) @@ -34,10 +84,11 @@ Please see [Getting Started](docs/source/asr/getting_started.md) and [tiny egs]( * [Relased Model](docs/source/asr/released_model.md) -## Questions and Help +## FAQ and Contributing -You are welcome to submit questions in [Github Discussions](https://github.com/PaddlePaddle/DeepSpeech/discussions) and bug reports in [Github Issues](https://github.com/PaddlePaddle/DeepSpeech/issues). You are also welcome to contribute to this project. +You are warmly welcome to submit questions in [Discussions](https://github.com/PaddlePaddle/DeepSpeech/discussions) and bug reports in [Issues](https://github.com/PaddlePaddle/DeepSpeech/issues)! +Also, we highly appreciate if you would like to contribute to this project! ## License @@ -45,4 +96,6 @@ DeepSpeech is provided under the [Apache-2.0 License](./LICENSE). ## Acknowledgement -We depends on many open source repos. See [References](docs/source/asr/reference.md) for more information. +DeepSpeech depends on many open source repos. See [References](docs/source/asr/reference.md) for more information. + + **Updates on 2021/10/20**: This [README.md](README.md) outline is not completed, especially for TTS module *from section **Features***.