|
|
@ -0,0 +1,958 @@
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cells": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# Story Talker"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## 用 PaddleOCR 识别图片中的文字"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"from PIL import Image\n",
|
|
|
|
|
|
|
|
"img_path = 'source/frog_prince.jpg'\n",
|
|
|
|
|
|
|
|
"im = Image.open(img_path)\n",
|
|
|
|
|
|
|
|
"im.show()"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## 使用 TTS 合成的音频"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"import IPython.display as dp\n",
|
|
|
|
|
|
|
|
"dp.Audio(\"source/ocr.wav\")"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"<font size=4>具体实现代码请参考: https://github.com/DeepSpeech/demos/story_talker/run.sh<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 元宇宙来袭,构造你的虚拟人!"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## 使用 PaddleGAN 合成的唇形视频"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"from IPython.display import HTML\n",
|
|
|
|
|
|
|
|
"html_str = '''\n",
|
|
|
|
|
|
|
|
"<video controls width=\"650\" height=\"365\" src=\"{}\">animation</video>\n",
|
|
|
|
|
|
|
|
"'''.format(\"output/tts_lips.mp4\")\n",
|
|
|
|
|
|
|
|
"dp.display(HTML(html_str))"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"<font size=4>具体实现代码请参考: https://github.com/DeepSpeech/demos/metaverse/run.sh<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 前言\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"近年来,随着深度学习算法上的进步以及不断丰厚的硬件资源条件,**文本转语音(Text-To-Speech, TTS)** 技术在智能语音助手、虚拟娱乐等领域得到了广泛的应用。本教程将结合背景知识,让用户能够使用PaddlePaddle完成文本转语音任务,并结合光学字符识别(Optical Character Recognition,OCR)、自然语言处理(Natural Language Processing,NLP)等技术“听”书、让名人开口说话。\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"## 背景知识\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"为了更好地了解文本转语音任务的要素,我们先简要地回顾一下文本转语音的发展历史。如果你对此已经有所了解,或希望能尽快使用代码实现,请跳至第二章。\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"### 定义\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<!----\n",
|
|
|
|
|
|
|
|
"Note: \n",
|
|
|
|
|
|
|
|
"1.此句抄自 [李沐Dive into Dive Learning](https://zh-v2.d2l.ai/chapter_introduction/index.html)\n",
|
|
|
|
|
|
|
|
"2.修改参考A survey on Neural Speech Sysnthesis.\n",
|
|
|
|
|
|
|
|
"---> \n",
|
|
|
|
|
|
|
|
"<font size=4> 文本转语音,又称语音合成(Speech Sysnthesis),指的是将一段文本按照一定需求转化成对应的音频,这种特性决定了的输出数据比输入输入长得多。文本转语音是一项包含了语义学、声学、数字信号处理以及机器学习的等多项学科的交叉任务。虽然辨识低质量音频文件的内容对人类来说很容易,但这对计算机来说并非易事。\n",
|
|
|
|
|
|
|
|
"</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"> Note: 这里可以提供一下资料出处嘛? 2021/11/09\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"按照不同的应用需求,更广义的语音合成研究包括:\n",
|
|
|
|
|
|
|
|
"- <font size=4>语音转换(Voice Transformation/Conversion)</font>\n",
|
|
|
|
|
|
|
|
" - 说话人转换\n",
|
|
|
|
|
|
|
|
" - 语音到歌唱转换(Speech to Singing)\n",
|
|
|
|
|
|
|
|
" - 语音情感转换\n",
|
|
|
|
|
|
|
|
" - 口音转换\n",
|
|
|
|
|
|
|
|
"- <font size=4>歌唱合成 (Singing Synthesis)</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>歌词到歌唱转换(Text/Lyric to Singing)</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>可视语音合成(Visual Speech Synthesis)</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"### 发展历史\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<!--\n",
|
|
|
|
|
|
|
|
"以下摘自维基百科 https://en.wikipedia.org/wiki/Speech_synthesis\n",
|
|
|
|
|
|
|
|
"--->\n",
|
|
|
|
|
|
|
|
"#### 机械式语音合成(19世纪及以前)\n",
|
|
|
|
|
|
|
|
"在第二次工业革命之前,语音的合成主要以机械式的音素合成为主。1779年,德裔丹麦科学家 Christian Gottlieb Kratzenstein 建造了人类的声道模型,使其可以产生五个长元音。1791年, Wolfgang von Kempelen 添加了唇和舌的模型,使其能够发出辅音和元音。\n",
|
|
|
|
|
|
|
|
"#### 电子语音合成(20世纪30年代)\n",
|
|
|
|
|
|
|
|
"贝尔实验室于20世纪30年代发明了声码器(Vocoder),将语音自动分解为音调和共振,此项技术由 Homer Dudley 改进为键盘式合成器并于 1939年纽约世界博览会展出。\n",
|
|
|
|
|
|
|
|
"#### 电子语音合成\n",
|
|
|
|
|
|
|
|
"第一台基于计算机的语音合成系统起源于 20 世纪 50 年代。1961 年,IBM 的 John Larry Kelly,以及 Louis Gerstman 使用 IBM 704 计算机合成语音,成为贝尔实验室最著名的成就之一。 1975年,第一代语音合成系统之一 —— MUSA(MUltichannel Speaking Automation)问世,其由一个独立的硬件和配套的软件组成。1978年发行的第二个版本也可以进行无伴奏演唱。90 年代的主流是采用 MIT 和贝尔实验室的系统,并结合自然语言处理模型。\n",
|
|
|
|
|
|
|
|
"> Note: 这里插一张timeline图\n",
|
|
|
|
|
|
|
|
"#### 当前的主流方法\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"- <font size=4>基于统计参数的语音合成</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>隐马尔可夫模型(Hidden Markov Model,HMM)</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>深度学习网络(Deep Neural Network,DNN)</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>波形拼接语音合成</font>\n",
|
|
|
|
|
|
|
|
" \n",
|
|
|
|
|
|
|
|
"- <font size=4>混合方法</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>参数轨迹指导的波形拼接</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>端到端神经网络语音合成</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>声学模型 + 声码器</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>“完全”端到端方法</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"## 基于深度学习的语音合成技术\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"### 语音合成基本知识\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"![信号处理流水线](source/signal_pipeline.png)\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>语音合成流水线包含 <font color=\"#ff0000\">**文本前端(Text Frontend)**</font> 、<font color=\"#ff0000\">**声学模型(Acoustic Model)**</font> 和 <font color=\"#ff0000\">**声码器(Vocoder)**</font> 三个主要模块:</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>通过文本前端模块将原始文本转换为字符/音素。</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>通过声学模型将字符/音素转换为声学特征,如线性频谱图、mel 频谱图、LPC 特征等。</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>通过声码器将声学特征转换为波形。</font>\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<img style=\"float: center;\" src=\"source/tts_pipeline.png\" width=\"85%\"/>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 实践\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>环境安装请参考: https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<font size=4>使用 **PaddleSpeech** 提供的预训练模型合成一句中文。</font>\n"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## step 0 准备"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 获取预训练模型"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"!mkdir download\n",
|
|
|
|
|
|
|
|
"!wget -P download https://paddlespeech.bj.bcebos.com/Parakeet/pwg_baker_ckpt_0.4.zip\n",
|
|
|
|
|
|
|
|
"!unzip -d download download/pwg_baker_ckpt_0.4.zip\n",
|
|
|
|
|
|
|
|
"!wget -P download https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip\n",
|
|
|
|
|
|
|
|
"!unzip -d download download/fastspeech2_nosil_baker_ckpt_0.4.zip"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 查看预训练模型的结构"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"!tree download/pwg_baker_ckpt_0.4\n",
|
|
|
|
|
|
|
|
"!tree download/fastspeech2_nosil_baker_ckpt_0.4"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 导入 Python 包"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"%load_ext autoreload\n",
|
|
|
|
|
|
|
|
"%autoreload 2\n",
|
|
|
|
|
|
|
|
"import logging\n",
|
|
|
|
|
|
|
|
"import sys\n",
|
|
|
|
|
|
|
|
"import warnings\n",
|
|
|
|
|
|
|
|
"warnings.filterwarnings('ignore')\n",
|
|
|
|
|
|
|
|
"# PaddleSpeech 项目根目录放到 python 路径中\n",
|
|
|
|
|
|
|
|
"sys.path.insert(0,\"../../../\")"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"import argparse\n",
|
|
|
|
|
|
|
|
"import os\n",
|
|
|
|
|
|
|
|
"from pathlib import Path\n",
|
|
|
|
|
|
|
|
"import IPython.display as dp\n",
|
|
|
|
|
|
|
|
"import matplotlib.pyplot as plt\n",
|
|
|
|
|
|
|
|
"import numpy as np\n",
|
|
|
|
|
|
|
|
"import paddle\n",
|
|
|
|
|
|
|
|
"import soundfile as sf\n",
|
|
|
|
|
|
|
|
"import yaml\n",
|
|
|
|
|
|
|
|
"from paddlespeech.t2s.frontend.zh_frontend import Frontend\n",
|
|
|
|
|
|
|
|
"from paddlespeech.t2s.models.fastspeech2 import FastSpeech2\n",
|
|
|
|
|
|
|
|
"from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Inference\n",
|
|
|
|
|
|
|
|
"from paddlespeech.t2s.models.parallel_wavegan import PWGGenerator\n",
|
|
|
|
|
|
|
|
"from paddlespeech.t2s.models.parallel_wavegan import PWGInference\n",
|
|
|
|
|
|
|
|
"from paddlespeech.t2s.modules.normalizer import ZScore\n",
|
|
|
|
|
|
|
|
"from yacs.config import CfgNode"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 设置预训练模型的路径"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"fastspeech2_config = \"download/fastspeech2_nosil_baker_ckpt_0.4/default.yaml\"\n",
|
|
|
|
|
|
|
|
"fastspeech2_checkpoint = \"download/fastspeech2_nosil_baker_ckpt_0.4/snapshot_iter_76000.pdz\"\n",
|
|
|
|
|
|
|
|
"fastspeech2_stat = \"download/fastspeech2_nosil_baker_ckpt_0.4/speech_stats.npy\"\n",
|
|
|
|
|
|
|
|
"pwg_config = \"download/pwg_baker_ckpt_0.4/pwg_default.yaml\"\n",
|
|
|
|
|
|
|
|
"pwg_checkpoint = \"download/pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz\"\n",
|
|
|
|
|
|
|
|
"pwg_stat = \"download/pwg_baker_ckpt_0.4/pwg_stats.npy\"\n",
|
|
|
|
|
|
|
|
"phones_dict = \"download/fastspeech2_nosil_baker_ckpt_0.4/phone_id_map.txt\"\n",
|
|
|
|
|
|
|
|
"# 读取 conf 文件并结构化\n",
|
|
|
|
|
|
|
|
"with open(fastspeech2_config) as f:\n",
|
|
|
|
|
|
|
|
" fastspeech2_config = CfgNode(yaml.safe_load(f))\n",
|
|
|
|
|
|
|
|
"with open(pwg_config) as f:\n",
|
|
|
|
|
|
|
|
" pwg_config = CfgNode(yaml.safe_load(f))\n",
|
|
|
|
|
|
|
|
"print(\"========Config========\")\n",
|
|
|
|
|
|
|
|
"print(fastspeech2_config)\n",
|
|
|
|
|
|
|
|
"print(\"---------------------\")\n",
|
|
|
|
|
|
|
|
"print(pwg_config)"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## step1 文本前端\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<font size=4>一个文本前端模块主要包含</font>:\n",
|
|
|
|
|
|
|
|
"- <font size=4>分段(Text Segmentation)</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"- <font size=4>文本正则化(Text Normalization, TN)</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"- <font size=4>分词(Word Segmentation, 主要是在中文中)</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"- <font size=4>词性标注(Part-of-Speech, PoS)</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>韵律预测(Prosody)</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>字音转换(Grapheme-to-Phoneme,G2P)</font>\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=2>(Grapheme: **语言**书写系统的最小有意义单位; Phoneme: 区分单词的最小**语音**单位)</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>多音字(Polyphone)</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>变调(Tone Sandhi)</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>“一”、“不”变调</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>三声变调</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>轻声变调</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>儿化音</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>方言</font>\n",
|
|
|
|
|
|
|
|
"- ...\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<font size=4>(输入给声学模型之前,还需要把音素序列转换为 id)</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>其中最重要的模块是<font color=\"#ff0000\"> 文本正则化 </font>模块和<font color=\"#ff0000\"> 字音转换(TTS 中更常用 G2P代指) </font>模块。</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<font size=4>各模块输出示例:</font>\n",
|
|
|
|
|
|
|
|
"```text\n",
|
|
|
|
|
|
|
|
"• Text: 全国一共有112所211高校\n",
|
|
|
|
|
|
|
|
"• Text Normalization: 全国一共有一百一十二所二一一高校\n",
|
|
|
|
|
|
|
|
"• Word Segmentation: 全国/一共/有/一百一十二/所/二一一/高校/\n",
|
|
|
|
|
|
|
|
"• G2P(注意此句中“一”的读音):\n",
|
|
|
|
|
|
|
|
" quan2 guo2 yi2 gong4 you3 yi4 bai3 yi1 shi2 er4 suo3 er4 yao1 yao1 gao1 xiao4\n",
|
|
|
|
|
|
|
|
" (可以进一步把声母和韵母分开)\n",
|
|
|
|
|
|
|
|
" q uan2 g uo2 y i2 g ong4 y ou3 y i4 b ai3 y i1 sh i2 er4 s uo3 er4 y ao1 y ao1 g ao1 x iao4\n",
|
|
|
|
|
|
|
|
" (把音调和声韵母分开)\n",
|
|
|
|
|
|
|
|
" q uan g uo y i g ong y ou y i b ai y i sh i er s uo er y ao y ao g ao x iao\n",
|
|
|
|
|
|
|
|
" 0 2 0 2 0 2 0 4 0 3 ...\n",
|
|
|
|
|
|
|
|
"• Prosody (prosodic words #1, prosodic phrases #2, intonation phrases #3, sentence #4):\n",
|
|
|
|
|
|
|
|
" 全国#2一共有#2一百#1一十二所#2二一一#1高校#4\n",
|
|
|
|
|
|
|
|
" (分词的结果一般是固定的,但是不同人习惯不同,可能有不同的韵律)\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>文本前端模块的设计需要融入很多专业的或经验性的知识,人类在读文本的时候可以自然而然地读出正确的发音,但是这些计算机都是不知道的!</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>分词:</font>\n",
|
|
|
|
|
|
|
|
"```text\n",
|
|
|
|
|
|
|
|
"我也想过过过儿过过的生活\n",
|
|
|
|
|
|
|
|
"我也想/过过/过儿/过过的/生活\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"货拉拉拉不拉拉布拉多\n",
|
|
|
|
|
|
|
|
"货拉拉/拉不拉/拉布拉多\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"南京市长江大桥\n",
|
|
|
|
|
|
|
|
"南京市长/江大桥\n",
|
|
|
|
|
|
|
|
"南京市/长江大桥\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=4>变调和儿化音:</font>\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"你要不要和我们一起出去玩?\n",
|
|
|
|
|
|
|
|
"你要不(2声)要和我们一(4声)起出去玩(儿)?\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"不好,我要一个人出去。\n",
|
|
|
|
|
|
|
|
"不(4声)好,我要一(2声)个人出去。\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"(以下每个词的所有字都是三声的,请你读一读,体会一下在读的时候,是否每个字都被读成了三声?)\n",
|
|
|
|
|
|
|
|
"纸老虎、虎骨酒、展览馆、岂有此理、手表厂有五种好产品\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=4>多音字(通常需要先正确分词):</font>\n",
|
|
|
|
|
|
|
|
"```text\n",
|
|
|
|
|
|
|
|
"人要行,干一行行一行,一行行行行行;\n",
|
|
|
|
|
|
|
|
"人要是不行,干一行不行一行,一行不行行行不行。\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"佟大为妻子产下一女\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"海水朝朝朝朝朝朝朝落\n",
|
|
|
|
|
|
|
|
"浮云长长长长长长长消\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<font size=4>PaddleSpeech TTS 文本前端解决方案:</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>文本正则: 规则</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>G2P:</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>多音字模块: pypinyin/g2pM</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>变调模块: 用分词 + 规则</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>相关 examples:\n",
|
|
|
|
|
|
|
|
" \n",
|
|
|
|
|
|
|
|
"https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/tn\n",
|
|
|
|
|
|
|
|
"https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/g2p</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>(未来计划推出基于深度学习的文本前端模块)</font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 构造文本前端对象"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 传入 phones_dict 会把相应的 phones 转换成 phone_ids\n",
|
|
|
|
|
|
|
|
"frontend = Frontend(phone_vocab_path=phones_dict)\n",
|
|
|
|
|
|
|
|
"print(\"Frontend done!\")"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 调用文本前端"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"input = \"你好,欢迎使用百度飞桨框架进行深度学习研究!\"\n",
|
|
|
|
|
|
|
|
"# input = \"我每天中午12:00起床\"\n",
|
|
|
|
|
|
|
|
"# input = \"我出生于2005/11/08,那天的最低气温达到-10°C\"\n",
|
|
|
|
|
|
|
|
"# text norm 时会进行分句,merge_sentences 表示把分句的结果合成一条\n",
|
|
|
|
|
|
|
|
"# 可以把 merge_sentences 设置为 False, 多个子句并行调用声学模型和声码器提升合成速度\n",
|
|
|
|
|
|
|
|
"input_ids = frontend.get_input_ids(input, merge_sentences=True, print_info=True)\n",
|
|
|
|
|
|
|
|
"# 由于 merge_sentences=True, input_ids[\"phone_ids\"][0] 即表示整句的 phone_ids\n",
|
|
|
|
|
|
|
|
"phone_ids = input_ids[\"phone_ids\"][0]\n",
|
|
|
|
|
|
|
|
"print(\"phone_ids:\")\n",
|
|
|
|
|
|
|
|
"print(phone_ids)"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## step1+ 文本前端深度学习化\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<img style=\"float: center;\" src=\"source/text_frontend_struct.png\" width=\"100%\"/>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## step2 声学模型\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>声学模型将字符/音素转换为声学特征,如线性频谱图、mel 频谱图、LPC 特征等,声学特征以 “帧” 为单位,一般一帧是 10ms 左右,一个音素一般对应 5~20 帧左右, 声学模型需要解决的是 <font color=\"#ff0000\">“不等长序列间的映射问题”</font>,“不等长”是指,同一个人发不同音素的持续时间不同,同一个人在不同时刻说同一句话的语速可能不同,对应各个音素的持续时间不同,不同人说话的特色不同,对应各个音素的持续时间不同。这是一个困难的“一对多”问题。</font>\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"# 卡尔普陪外孙玩滑梯\n",
|
|
|
|
|
|
|
|
"000001|baker_corpus|sil 20 k 12 a2 4 er2 10 p 12 u3 12 p 9 ei2 9 uai4 15 s 11 uen1 12 uan2 14 h 10 ua2 11 t 15 i1 16 sil 20\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<font size=4>声学模型主要分为自回归模型和非自回归模型,其中自回归模型在 `t` 时刻的预测需要依赖 `t-1` 时刻的输出作为输入,预测时间长,但是音质相对较好,非自回归模型不存在预测上的依赖关系,预测时间快,音质相对较差。</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>主流声学模型发展的脉络:</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>自回归模型:</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>Tacotron</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>Tacotron2</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>Transformer TTS</font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>非自回归模型:</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>FastSpeech</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>SpeedySpeech</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>FastPitch</font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>FastSpeech2</font>\n",
|
|
|
|
|
|
|
|
" - ...\n",
|
|
|
|
|
|
|
|
" \n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>在本教程中,我们使用 `FastSpeech2` 作为声学模型。<font>\n",
|
|
|
|
|
|
|
|
"![FastSpeech2](source/fastspeech2.png)\n",
|
|
|
|
|
|
|
|
"<font size=4>PaddleSpeech TTS 实现的 FastSpeech2 与论文不同的地方在于,我们使用的的是 phone 级别的 `pitch` 和 `energy`(与 FastPitch 类似)。<font>\n",
|
|
|
|
|
|
|
|
"![FastPitch](source/fastpitch.png)\n",
|
|
|
|
|
|
|
|
"<font size=4>更多关于声学模型的发展及改进的介绍: https://paddlespeech.readthedocs.io/en/latest/tts/models_introduction.html<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 初始化声学模型 FastSpeech2"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"with open(phones_dict, \"r\") as f:\n",
|
|
|
|
|
|
|
|
" phn_id = [line.strip().split() for line in f.readlines()]\n",
|
|
|
|
|
|
|
|
"vocab_size = len(phn_id)\n",
|
|
|
|
|
|
|
|
"print(\"vocab_size:\", vocab_size)\n",
|
|
|
|
|
|
|
|
"odim = fastspeech2_config.n_mels\n",
|
|
|
|
|
|
|
|
"model = FastSpeech2(\n",
|
|
|
|
|
|
|
|
" idim=vocab_size, odim=odim, **fastspeech2_config[\"model\"])\n",
|
|
|
|
|
|
|
|
"# 预训练好的参数赋值给模型\n",
|
|
|
|
|
|
|
|
"model.set_state_dict(paddle.load(fastspeech2_checkpoint)[\"main_params\"])\n",
|
|
|
|
|
|
|
|
"# 推理阶段不启用 batch norm 和 dropout\n",
|
|
|
|
|
|
|
|
"model.eval()\n",
|
|
|
|
|
|
|
|
"# 读取数据预处理阶段数据集的均值和标准差\n",
|
|
|
|
|
|
|
|
"stat = np.load(fastspeech2_stat)\n",
|
|
|
|
|
|
|
|
"mu, std = stat\n",
|
|
|
|
|
|
|
|
"mu = paddle.to_tensor(mu)\n",
|
|
|
|
|
|
|
|
"std = paddle.to_tensor(std)\n",
|
|
|
|
|
|
|
|
"fastspeech2_normalizer = ZScore(mu, std)\n",
|
|
|
|
|
|
|
|
"# 构造包含 normalize 的新模型\n",
|
|
|
|
|
|
|
|
"fastspeech2_inference = FastSpeech2Inference(fastspeech2_normalizer, model)\n",
|
|
|
|
|
|
|
|
"fastspeech2_inference.eval()\n",
|
|
|
|
|
|
|
|
"print(\"FastSpeech2 done!\")"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 调用声学模型"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"with paddle.no_grad():\n",
|
|
|
|
|
|
|
|
" mel = fastspeech2_inference(phone_ids)\n",
|
|
|
|
|
|
|
|
"print(\"shepe of mel (n_frames x n_mels):\")\n",
|
|
|
|
|
|
|
|
"print(mel.shape)\n",
|
|
|
|
|
|
|
|
"# 绘制声学模型输出的 mel 频谱\n",
|
|
|
|
|
|
|
|
"fig, ax = plt.subplots(figsize=(9, 6))\n",
|
|
|
|
|
|
|
|
"im = ax.imshow(mel.T, aspect='auto',origin='lower')\n",
|
|
|
|
|
|
|
|
"fig.colorbar(im, ax=ax)\n",
|
|
|
|
|
|
|
|
"plt.title('Mel Spectrogram')\n",
|
|
|
|
|
|
|
|
"plt.xlabel('Time')\n",
|
|
|
|
|
|
|
|
"plt.ylabel('Frequency')\n",
|
|
|
|
|
|
|
|
"plt.tight_layout()"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## step3 声码器\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>声码器将声学特征转换为波形。声码器需要解决的是 <font color=\"#ff0000\">“信息缺失的补全问题”</font>。信息缺失是指,在音频波形转换为频谱图的时候,存在**相位信息**的缺失,在频谱图转换为 mel 频谱图的时候,存在**频域压缩**导致的信息缺失;假设音频的采样率是16kHZ, 一帧的音频有 10ms,也就是说,1s 的音频有 16000 个采样点,而 1s 中包含 100 帧,每一帧有 160 个采样点,声码器的作用就是将一个频谱帧变成音频波形的 160 个采样点,所以声码器中一般会包含**上采样**模块。<font>\n",
|
|
|
|
|
|
|
|
" \n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>与声学模型类似,声码器也分为自回归模型和非自回归模型, 更细致的分类如下:<font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"- <font size=4>Autoregression<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>WaveNet<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>WaveRNN<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>LPCNet<font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>Flow<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>WaveFlow<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>WaveGlow<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>FloWaveNet<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>Parallel WaveNet<font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>GAN<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>WaveGAN<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>arallel WaveGAN<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>MelGAN<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>HiFi-GAN<font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>VAE\n",
|
|
|
|
|
|
|
|
" - <font size=4>Wave-VAE<font>\n",
|
|
|
|
|
|
|
|
"- <font size=4>Diffusion<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>WaveGrad<font>\n",
|
|
|
|
|
|
|
|
" - <font size=4>DiffWave<font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>PaddleSpeech TTS 主要实现了百度的 `WaveFlow` 和一些主流的 GAN Vocoder, 在本教程中,我们使用 `Parallel WaveGAN` 作为声码器。<font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br> \n",
|
|
|
|
|
|
|
|
"<img style=\"float: center;\" src=\"source/pwgan.png\" width=\"75%\"/> \n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=4>各 GAN Vocoder 的生成器和判别器的 Loss 的区别如下表格所示:<font>\n",
|
|
|
|
|
|
|
|
" \n",
|
|
|
|
|
|
|
|
"Model | Generator Loss |Discriminator Loss\n",
|
|
|
|
|
|
|
|
":-------------:| :------------:| :-----\n",
|
|
|
|
|
|
|
|
"Parallel Wave GAN| adversial loss <br> Feature Matching | Multi-Scale Discriminator |\n",
|
|
|
|
|
|
|
|
"Mel GAN |adversial loss <br> Multi-resolution STFT loss | adversial loss|\n",
|
|
|
|
|
|
|
|
"Multi-Band Mel GAN | adversial loss <br> full band Multi-resolution STFT loss <br> sub band Multi-resolution STFT loss |Multi-Scale Discriminator|\n",
|
|
|
|
|
|
|
|
"HiFi GAN |adversial loss <br> Feature Matching <br> Mel-Spectrogram Loss | Multi-Scale Discriminator <br> Multi-Period Discriminator|\n"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 初始化声码器 Parallel WaveGAN"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"vocoder = PWGGenerator(**pwg_config[\"generator_params\"])\n",
|
|
|
|
|
|
|
|
"# 预训练好的参数赋值给模型\n",
|
|
|
|
|
|
|
|
"vocoder.set_state_dict(paddle.load(pwg_checkpoint)[\"generator_params\"])\n",
|
|
|
|
|
|
|
|
"vocoder.remove_weight_norm()\n",
|
|
|
|
|
|
|
|
"# 推理阶段不启用 batch norm 和 dropout\n",
|
|
|
|
|
|
|
|
"vocoder.eval()\n",
|
|
|
|
|
|
|
|
"# 读取数据预处理阶段数据集的均值和标准差\n",
|
|
|
|
|
|
|
|
"stat = np.load(pwg_stat)\n",
|
|
|
|
|
|
|
|
"mu, std = stat\n",
|
|
|
|
|
|
|
|
"mu = paddle.to_tensor(mu)\n",
|
|
|
|
|
|
|
|
"std = paddle.to_tensor(std)\n",
|
|
|
|
|
|
|
|
"pwg_normalizer = ZScore(mu, std)\n",
|
|
|
|
|
|
|
|
"# 构造包含 normalize 的新模型\n",
|
|
|
|
|
|
|
|
"pwg_inference = PWGInference(pwg_normalizer, vocoder)\n",
|
|
|
|
|
|
|
|
"pwg_inference.eval()\n",
|
|
|
|
|
|
|
|
"print(\"Parallel WaveGAN done!\")"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 调用声码器"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": false
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"with paddle.no_grad():\n",
|
|
|
|
|
|
|
|
" wav = pwg_inference(mel)\n",
|
|
|
|
|
|
|
|
"print(\"shepe of wav (time x n_channels):\")\n",
|
|
|
|
|
|
|
|
"print(wav.shape)\n",
|
|
|
|
|
|
|
|
"# 绘制声码器输出的波形图\n",
|
|
|
|
|
|
|
|
"wave_data = wav.numpy().T\n",
|
|
|
|
|
|
|
|
"time = np.arange(0, wave_data.shape[1]) * (1.0 / fastspeech2_config.fs)\n",
|
|
|
|
|
|
|
|
"fig, ax = plt.subplots(figsize=(9, 6))\n",
|
|
|
|
|
|
|
|
"plt.plot(time, wave_data[0])\n",
|
|
|
|
|
|
|
|
"plt.title('Waveform')\n",
|
|
|
|
|
|
|
|
"plt.xlabel('Time (seconds)')\n",
|
|
|
|
|
|
|
|
"plt.ylabel('Amplitude (normed)')\n",
|
|
|
|
|
|
|
|
"plt.tight_layout()"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 播放音频"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"dp.Audio(wav.numpy().T, rate=fastspeech2_config.fs)"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 保存音频"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"scrolled": true
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"!mkdir output\n",
|
|
|
|
|
|
|
|
"sf.write(\n",
|
|
|
|
|
|
|
|
" \"output/output.wav\",\n",
|
|
|
|
|
|
|
|
" wav.numpy(),\n",
|
|
|
|
|
|
|
|
" samplerate=fastspeech2_config.fs)"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## step4 FastSpeech2 进阶 —— 个性化调节\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=3>FastSpeech2 模型可以个性化地调节音素时长、音调和能量,通过一些简单的调节就可以获得一些有意思的效果<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 不要听信别人的谗言,我不是什么克隆人。\n",
|
|
|
|
|
|
|
|
"print(\"原始音频\")\n",
|
|
|
|
|
|
|
|
"dp.display(dp.Audio(url=\"https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/speed/x1_001.wav\"))\n",
|
|
|
|
|
|
|
|
"print(\"speed x 1.2\")\n",
|
|
|
|
|
|
|
|
"dp.display(dp.Audio(url=\"https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/speed/x1.2_001.wav\"))\n",
|
|
|
|
|
|
|
|
"print(\"speed x 0.8\")\n",
|
|
|
|
|
|
|
|
"dp.display(dp.Audio(url=\"https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/speed/x0.8_001.wav\"))\n",
|
|
|
|
|
|
|
|
"print(\"pitch x 1.3(童声)\")\n",
|
|
|
|
|
|
|
|
"dp.display(dp.Audio(url=\"https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/child_voice/001.wav\"))\n",
|
|
|
|
|
|
|
|
"print(\"robot\")\n",
|
|
|
|
|
|
|
|
"dp.display(dp.Audio(url=\"https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/robot/001.wav\"))"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"<font size=4>具体实现代码请参考: https://github.com/DeepSpeech/demos/style_fs2/run.sh<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"# 用 PaddleSpeech 训练 TTS 模型\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"<font size=3>PaddleSpeech 的 examples 是按照 数据集/模型 的结构安排的:<font>\n",
|
|
|
|
|
|
|
|
"```text\n",
|
|
|
|
|
|
|
|
"examples \n",
|
|
|
|
|
|
|
|
"|-- aishell3\n",
|
|
|
|
|
|
|
|
"| |-- README.md\n",
|
|
|
|
|
|
|
|
"| |-- tts3\n",
|
|
|
|
|
|
|
|
"| `-- vc0\n",
|
|
|
|
|
|
|
|
"|-- csmsc\n",
|
|
|
|
|
|
|
|
"| |-- README.md\n",
|
|
|
|
|
|
|
|
"| |-- tts2\n",
|
|
|
|
|
|
|
|
"| |-- tts3\n",
|
|
|
|
|
|
|
|
"| |-- voc1\n",
|
|
|
|
|
|
|
|
"| `-- voc3\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=3>我们在每个数据集的 README.md 介绍了子目录和模型的对应关系, 在 TTS 中有如下对应关系:<font>\n",
|
|
|
|
|
|
|
|
"```text\n",
|
|
|
|
|
|
|
|
"tts0 - Tactron2\n",
|
|
|
|
|
|
|
|
"tts1 - TransformerTTS\n",
|
|
|
|
|
|
|
|
"tts2 - SpeedySpeech\n",
|
|
|
|
|
|
|
|
"tts3 - FastSpeech2\n",
|
|
|
|
|
|
|
|
"voc0 - WaveFlow\n",
|
|
|
|
|
|
|
|
"voc1 - Parallel WaveGAN\n",
|
|
|
|
|
|
|
|
"voc2 - MelGAN\n",
|
|
|
|
|
|
|
|
"voc3 - MultiBand MelGAN\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"## 基于 CSMCS 数据集训练 FastSpeech2 模型\n",
|
|
|
|
|
|
|
|
"```bash\n",
|
|
|
|
|
|
|
|
"git clone https://github.com/PaddlePaddle/PaddleSpeech.git\n",
|
|
|
|
|
|
|
|
"cd examples/csmsc/tts\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=3>根据 README.md, 下载 CSMCS 数据集和其对应的强制对齐文件, 并放置在对应的位置<font>\n",
|
|
|
|
|
|
|
|
"```bash\n",
|
|
|
|
|
|
|
|
"./run.sh\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=3>`run.sh` 中包含预处理、训练、合成、静态图推理等步骤:</font>\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"```bash\n",
|
|
|
|
|
|
|
|
"#!/bin/bash\n",
|
|
|
|
|
|
|
|
"set -e\n",
|
|
|
|
|
|
|
|
"source path.sh\n",
|
|
|
|
|
|
|
|
"gpus=0,1\n",
|
|
|
|
|
|
|
|
"stage=0\n",
|
|
|
|
|
|
|
|
"stop_stage=100\n",
|
|
|
|
|
|
|
|
"conf_path=conf/default.yaml\n",
|
|
|
|
|
|
|
|
"train_output_path=exp/default\n",
|
|
|
|
|
|
|
|
"ckpt_name=snapshot_iter_153.pdz\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"# with the following command, you can choice the stage range you want to run\n",
|
|
|
|
|
|
|
|
"# such as `./run.sh --stage 0 --stop-stage 0`\n",
|
|
|
|
|
|
|
|
"# this can not be mixed use with `$1`, `$2` ...\n",
|
|
|
|
|
|
|
|
"source ${MAIN_ROOT}/utils/parse_options.sh || exit 1\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then\n",
|
|
|
|
|
|
|
|
" # prepare data\n",
|
|
|
|
|
|
|
|
" bash ./local/preprocess.sh ${conf_path} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then\n",
|
|
|
|
|
|
|
|
" # train model, all `ckpt` under `train_output_path/checkpoints/` dir\n",
|
|
|
|
|
|
|
|
" CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${train_output_path} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then\n",
|
|
|
|
|
|
|
|
" # synthesize, vocoder is pwgan\n",
|
|
|
|
|
|
|
|
" CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then\n",
|
|
|
|
|
|
|
|
" # synthesize_e2e, vocoder is pwgan\n",
|
|
|
|
|
|
|
|
" CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then\n",
|
|
|
|
|
|
|
|
" # inference with static model\n",
|
|
|
|
|
|
|
|
" CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<br></br>\n",
|
|
|
|
|
|
|
|
"## 基于 CSMCS 数据集训练 Parallel WaveGAN 模型\n",
|
|
|
|
|
|
|
|
"```bash\n",
|
|
|
|
|
|
|
|
"git clone https://github.com/PaddlePaddle/PaddleSpeech.git\n",
|
|
|
|
|
|
|
|
"cd examples/csmsc/voc1\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=3>根据 README.md, 下载 CSMCS 数据集和其对应的强制对齐文件, 并放置在对应的位置<font>\n",
|
|
|
|
|
|
|
|
"```bash\n",
|
|
|
|
|
|
|
|
"./run.sh\n",
|
|
|
|
|
|
|
|
"```\n",
|
|
|
|
|
|
|
|
"<font size=3>`run.sh` 中包含预处理、训练、合成等步骤:</font>\n",
|
|
|
|
|
|
|
|
"```bash\n",
|
|
|
|
|
|
|
|
"#!/bin/bash\n",
|
|
|
|
|
|
|
|
"set -e\n",
|
|
|
|
|
|
|
|
"source path.sh\n",
|
|
|
|
|
|
|
|
"gpus=0,1\n",
|
|
|
|
|
|
|
|
"stage=0\n",
|
|
|
|
|
|
|
|
"stop_stage=100\n",
|
|
|
|
|
|
|
|
"conf_path=conf/default.yaml\n",
|
|
|
|
|
|
|
|
"train_output_path=exp/default\n",
|
|
|
|
|
|
|
|
"ckpt_name=snapshot_iter_5000.pdz\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"# with the following command, you can choice the stage range you want to run\n",
|
|
|
|
|
|
|
|
"# such as `./run.sh --stage 0 --stop-stage 0`\n",
|
|
|
|
|
|
|
|
"# this can not be mixed use with `$1`, `$2` ...\n",
|
|
|
|
|
|
|
|
"source ${MAIN_ROOT}/utils/parse_options.sh || exit 1\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then\n",
|
|
|
|
|
|
|
|
" # prepare data\n",
|
|
|
|
|
|
|
|
" ./local/preprocess.sh ${conf_path} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then\n",
|
|
|
|
|
|
|
|
" # train model, all `ckpt` under `train_output_path/checkpoints/` dir\n",
|
|
|
|
|
|
|
|
" CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${train_output_path} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then\n",
|
|
|
|
|
|
|
|
" # synthesize\n",
|
|
|
|
|
|
|
|
" CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1\n",
|
|
|
|
|
|
|
|
"fi\n",
|
|
|
|
|
|
|
|
"```"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# FAQ\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"- <font size=3>需要注意的问题<font>\n",
|
|
|
|
|
|
|
|
"- <font size=3>经验与分享<font>\n",
|
|
|
|
|
|
|
|
"- <font size=3>用户的其他问题<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 作业\n",
|
|
|
|
|
|
|
|
"<font size=4>在 CSMSC 数据集上利用 FastSpeech2 和 Parallel WaveGAN 实现一个中文 TTS 系统<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 关注 PaddleSpeech\n",
|
|
|
|
|
|
|
|
"<font size=3>https://github.com/PaddlePaddle/PaddleSpeech/<font>"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"kernelspec": {
|
|
|
|
|
|
|
|
"display_name": "Python 3.7.0 64-bit ('yt_py37_develop': venv)",
|
|
|
|
|
|
|
|
"language": "python",
|
|
|
|
|
|
|
|
"name": "python37064bitytpy37developvenv88cd689abeac41d886f9210a708a170b"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"language_info": {
|
|
|
|
|
|
|
|
"codemirror_mode": {
|
|
|
|
|
|
|
|
"name": "ipython",
|
|
|
|
|
|
|
|
"version": 3
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"file_extension": ".py",
|
|
|
|
|
|
|
|
"mimetype": "text/x-python",
|
|
|
|
|
|
|
|
"name": "python",
|
|
|
|
|
|
|
|
"nbconvert_exporter": "python",
|
|
|
|
|
|
|
|
"pygments_lexer": "ipython3",
|
|
|
|
|
|
|
|
"version": "3.7.0"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"toc": {
|
|
|
|
|
|
|
|
"base_numbering": 1,
|
|
|
|
|
|
|
|
"nav_menu": {},
|
|
|
|
|
|
|
|
"number_sections": true,
|
|
|
|
|
|
|
|
"sideBar": true,
|
|
|
|
|
|
|
|
"skip_h1_title": false,
|
|
|
|
|
|
|
|
"title_cell": "Table of Contents",
|
|
|
|
|
|
|
|
"title_sidebar": "Contents",
|
|
|
|
|
|
|
|
"toc_cell": false,
|
|
|
|
|
|
|
|
"toc_position": {
|
|
|
|
|
|
|
|
"height": "calc(100% - 180px)",
|
|
|
|
|
|
|
|
"left": "10px",
|
|
|
|
|
|
|
|
"top": "150px",
|
|
|
|
|
|
|
|
"width": "263.594px"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"toc_section_display": true,
|
|
|
|
|
|
|
|
"toc_window_display": true
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"nbformat": 4,
|
|
|
|
|
|
|
|
"nbformat_minor": 4
|
|
|
|
|
|
|
|
}
|