From 601330cd0afc623f109e95aa893c19dce7202406 Mon Sep 17 00:00:00 2001 From: enkilee Date: Fri, 29 Nov 2024 16:54:51 +0800 Subject: [PATCH] fix --- examples/csmsc/voc1/README.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/examples/csmsc/voc1/README.md b/examples/csmsc/voc1/README.md index b9ea2e906..a1f63fc46 100644 --- a/examples/csmsc/voc1/README.md +++ b/examples/csmsc/voc1/README.md @@ -3,12 +3,15 @@ This example contains code used to train a [parallel wavegan](http://arxiv.org/a ## Dataset ### Download and Extract Download CSMSC from it's [official website](https://test.data-baker.com/data/index/TNtts/) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`. -The structure of the folder is listed below. -```text -datasets/BZNSYP -└── Wave - └──XXXX.wav files -``` +datasets/BZNSYP should have three folders: + +└─ Wave + └─ .wav files (audio speech) + └─ PhoneLabeling + └─ .interval files (alignment between phoneme and duration) + └─ ProsodyLabeling + └─ 000001-010000.txt (text with prosodic by pinyin) +Still we only use .wav files in training. ### Get MFA Result and Extract We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.