From 48700c847d91c989fe43ab81f5da7e4fbf5d38e9 Mon Sep 17 00:00:00 2001 From: TianYuan Date: Mon, 6 Dec 2021 09:20:56 +0000 Subject: [PATCH] update demos readme --- demos/story_talker/README.md | 5 +++++ demos/style_fs2/README.md | 15 ++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/demos/story_talker/README.md b/demos/story_talker/README.md index 14c068c3..62faa7fb 100644 --- a/demos/story_talker/README.md +++ b/demos/story_talker/README.md @@ -1,7 +1,12 @@ # Story Talker +## Introduction +Storybooks are very important children's enlightenment books, but parents usually don't have enough time to read storybooks for their children. For very young children, they may not understand the Chinese characters in storybooks. Or sometimes, children just want to "listen" but don't want to "read". + You can use `PaddleOCR` to get the text of a storybook, and read it by the `TTS` mudule of `PaddleSpeech`. +## Usage Run the following command line to get started: ``` ./run.sh ``` +The result has shown on our [notebook](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/tutorial/tts/tts_tutorial.ipynb). diff --git a/demos/style_fs2/README.md b/demos/style_fs2/README.md index ca8c9812..c80b5731 100644 --- a/demos/style_fs2/README.md +++ b/demos/style_fs2/README.md @@ -1,6 +1,19 @@ # Style FastSpeech2 -You can change the `pitch`、`duration` and `energy` of `FastSpeech2`, then get some interesting results. +## Introduction +[FastSpeech2](https://arxiv.org/abs/2006.04558) is a classical acoustic model for Text-to-Speech synthesis, which introduces controllable speech input, including `phoneme duration`、`energy` and `pitch`. +In the prediction phase, you can change these controllable variables to get some interesting results. + +For example: + +1. The `duration` control in `FastSpeech2` can control the speed of audios will keep the `pitch`. (in some speech tool, increase the speed will increase the pitch, and vice versa.) + +2. When we set `pitch` of one sentence to a mean value and set `tones` of phones to `1`, we will get a `robot-style` timbre. + +3. When we raise the `pitch` of an adult female (with a fixed scale ratio), we will get a `child-style` timbre. + +The `duration` and `pitch` of different phonemes in a sentence can have different scale ratios. You can set different scale ratios to emphasize or weaken the pronunciation of some phonemes. +## Usage Run the following command line to get started: ``` ./run.sh