From eeadee1e7f83d8397b578665143fcd022bffe5df Mon Sep 17 00:00:00 2001 From: Mingxue-Xu <92848346+Mingxue-Xu@users.noreply.github.com> Date: Mon, 13 Dec 2021 14:46:34 +0800 Subject: [PATCH] [README] Update ST and AC info in README.md --- README.md | 102 +++++++++++++++++++++++++++------- docs/source/released_model.md | 15 +++++ 2 files changed, 97 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 6c7aa30b..5004df5d 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme 4.What is the goal of this project? --> -**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech, with the state-of-art and influential models. +**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models. ##### Speech-to-Text @@ -86,26 +86,49 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html). +##### Speech Translation + +
+ + + + + + + + + + + + + +
Input Audio Translations Result
+ +
+
“我 在 这栋 建筑 的 古老 门上 敲门。”
+ +
+ Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at: -- **Fast and Light-weight**: we provide high-speed and ultra-lightweight models that are convenient for industrial deployment. +- **Ease of Use**: low barries to install, and [CLI](#quick-start) is available to quick-start your journey. +- **Align to the State-of-the-Art**: we provide high-speed and ultra-lightweight models, and also cutting edge technology. - **Rule-based Chinese frontend**: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context. - **Varieties of Functions that Vitalize both Industrial and Academia**: - - *Implementation of critical audio tasks*: this toolkit contains audio functions like Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, Voice Cloning, etc. + - *Implementation of critical audio tasks*: this toolkit contains audio functions like Audio Classification, Speech Translation, Automatic Speech Recognition, Text-to-Speech Synthesis, etc. - *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details. - - *Cascaded models application*: as an extension of the application of traditional audio tasks, we combine the workflows of aforementioned tasks with other fields like Natural language processing (NLP), like Punctuation Restoration. + - *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV). ## Installation -The base environment in this page is -- Ubuntu 16.04 -- python>=3.7 -- paddlepaddle>=2.2.0 - -If you want to set up PaddleSpeech in other environment, please see the [installation](./docs/source/install.md) documents for all the alternatives. +We strongly recommend our users to install PaddleSpeech in *Linux* with *python>=3.7* and *paddlepaddle>=2.2.0*, where `paddlespeech` can be easily installed with `pip`: +```python +pip install paddlespeech +``` +If you want to set up in other environment, please see the [installation](./docs/source/install.md) for all the alternatives. ## Quick Start -Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text file. +Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text. **Audio Classification** ```shell @@ -124,13 +147,13 @@ paddlespeech st --input input_16k.wav paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output output.wav ``` -If you want to try more functions like training and tuning, please see [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md). +If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md). ## Model List -PaddleSpeech supports a series of most popular models, summarized in [released models](./docs/source/released_model.md) with available pretrained models. +PaddleSpeech supports a series of most popular models. They are summarized in [released models](./docs/source/released_model.md) and attached with available pretrained models. -Speech-to-Text module contains *Acoustic Model* and *Language Model*, with the following details: +**Speech-to-Text** contains *Acoustic Model* and *Language Model*, with the following details: