diff --git a/README.md b/README.md index 5c064c18c..b91c74899 100644 --- a/README.md +++ b/README.md @@ -29,12 +29,20 @@ how they can install it, how they can use it --> -**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for two critical tasks in Speech - Automatic Speech Recognition (ASR) and Text-To-Speech Synthesis (TTS), with modules involving state-of-art and influential models. +**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for two critical tasks in Speech - Automatic Speech Recognition (ASR) and Text-To-Speech Synthesis (TTS), with modules involving state-of-art and influential models. + +Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing module, and deployment. Besides, this toolkit also features at: +- **Rule-based Chinese frontend**: we utilize plenty of Chinese datasets and corpora to enhance user experience, including CSMSC and Baidu Internal Corpus. +- **Supporting of ASR streaming and non-streaming data**: This toolkit contains non-streaming models like [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf), [Transformer](https://arxiv.org/abs/1706.03762) and [Conformer](https://arxiv.org/abs/2005.08100). And for streaming models, we have [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf) and [U2](https://arxiv.org/pdf/2012.05481.pdf). +- **Varieties of mainstream models**: The toolkit integrates modules that participate in the whole pipeline of both ASR and TTS, [See also model lists](#models-list). + +> Notes: It is better to add a brief getting started. ## Table of Contents +The contents of this README is as follow: + - [Table of Contents](#table-of-contents) -- [Features](#features) - [Installation](#installation) - [Quick Start](#quick-start) - [Models List](#models-list) @@ -43,42 +51,6 @@ how they can use it - [License](#license) - [Acknowledgement](#acknowledgement) -## Features - -Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing module, and deployment. - -> Note: 1.Better add hyperlinks for code path; 2.The current `Features` is a bit long. Is there any idea to shorten this section? - - -The features of **ASR** are summarized as follows: -- **Used datasets** - - Aishell, THCHS30, TIMIT and Librispeech -- **Model support of streaming and non-streaming data** - - Non-streaming: [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf), [Transformer](https://arxiv.org/abs/1706.03762) and [Conformer](https://arxiv.org/abs/2005.08100) - - Streaming: [Baidu's DeepSpeech2](http://proceedings.mlr.press/v48/amodei16.pdf) and [U2](https://arxiv.org/pdf/2012.05481.pdf) -- **Language Model**: Ngram -- **Decoder**: ctc greedy, ctc prefix beam search, greedy, beam search, attention rescore -- **Aligment**: MFA, CTC Aligment -- **Speech Frontend** - - Audio: Auto Gain - - Feature: kaldi fbank, kaldi mfcc, linear, delta detla -- **Speech Augmentation** - - Audio: Auto Gain - - Feature: Volume Perturbation, Speed Perturbation, Shifting Perturbation, Online Bayesian normalization, Noise Perturbation, Impulse Response,Spectrum, SpecAugment, Adaptive SpecAugment -- **Tokenizer**: Chinese/English Character, English Word, Sentence Piece - -- **Word Segmentation**: [mmseg](http://technology.chtsai.org/mmseg/) - -The features of **TTS** are summarized as follows: - - -- **Text FrontEnd**: Rule based *Chinese* frontend. -- **Acoustic Models**: FastSpeech2, SpeedySpeech, TransformerTTS, Tacotron2 -- **Vocoders**: Parallel WaveGAN, WaveFlow -- **Voice Cloning**: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis, GE2E - ## Installation > Note: The installation guidance of TTS and ASR is now separated. @@ -92,16 +64,90 @@ Please see the [ASR installation](docs/source/asr/install.md) and [TTS installat ## Quick Start +> Note: It is better to use code blocks rather than hyperlinks. + Please see [ASR getting started](docs/source/asr/getting_started.md) ([tiny test](examples/tiny/s0/README.md)) and [TTS Basic Use](/docs/source/tts/basic_usage.md). ## Models List -PaddleSpeech ASR supports a lot of mainstream models. For more information, please refer to [ASRModels](./docs/source/asr/released_model.md). +PaddleSpeech ASR supports a lot of mainstream models, which are summarized as follow. For more information, please refer to [ASRModels](./docs/source/asr/released_model.md). +
ASR Module Type | +Model Type | +Dataset | +Link | +
---|---|---|---|
Acoustic Model | +2 Conv + 5 LSTM layers with only forward direction | +Aishell | ++ Ds2 Online Aishell Model + | +
2 Conv + 3 bidirectional GRU layers | ++ Ds2 Offline Aishell Model + | +||
Encoder:Conformer, Decoder:Transformer, Decoding method: Attention + CTC | ++ Conformer Offline Aishell Model + | +||
Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | ++ Conformer Librispeech Model + | +||
Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | +Librispeech | +Conformer Librispeech Model | +|
Encoder:Conformer, Decoder:Transformer, Decoding method: Attention | ++ Transformer Librispeech Model + | +||
Language Model | +English LM | +CommonCrawl(en.00) | ++ English LM + | +
Mandarin LM Small | +Baidu Internal Corpus | ++ Mandarin LM Small + | +|
Mandarin LM Large | ++ Mandarin LM Large + | +