Merge pull request #2087 from yt605155624/add_blank

[TTS]install CPython version monotonic_align before training
pull/2096/head
TianYuan 3 years ago committed by GitHub
commit 60c1a1e575
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,3 +1,4 @@
([简体中文](./README_cn.md)|English) ([简体中文](./README_cn.md)|English)
<p align="center"> <p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" /> <img src="./docs/images/PaddleSpeech_logo.png" />
@ -494,6 +495,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
<a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a> <a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a>
</td> </td>
</tr> </tr>
<tr>
<td rowspan="3">End-to-End</td>
<td>VITS</td>
<td >CSMSC</td>
<td>
<a href = "./examples/csmsc/vits">VITS-csmsc</a>
</td>
</tr>
</tbody> </tbody>
</table> </table>

@ -1,3 +1,4 @@
(简体中文|[English](./README.md)) (简体中文|[English](./README.md))
<p align="center"> <p align="center">
<img src="./docs/images/PaddleSpeech_logo.png" /> <img src="./docs/images/PaddleSpeech_logo.png" />
@ -481,6 +482,15 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
<a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a> <a href = "./examples/aishell3/vc1">ge2e-fastspeech2-aishell3</a>
</td> </td>
</tr> </tr>
</tr>
<tr>
<td rowspan="3">端到端</td>
<td>VITS</td>
<td >CSMSC</td>
<td>
<a href = "./examples/csmsc/vits">VITS-csmsc</a>
</td>
</tr>
</tbody> </tbody>
</table> </table>

@ -144,3 +144,34 @@ optional arguments:
6. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu. 6. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
## Pretrained Model ## Pretrained Model
The pretrained model can be downloaded here:
- [vits_csmsc_ckpt_1.1.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/vits/vits_csmsc_ckpt_1.1.0.zip) (add_blank=true)
VITS checkpoint contains files listed below.
```text
vits_csmsc_ckpt_1.1.0
├── default.yaml # default config used to train vitx
├── phone_id_map.txt # phone vocabulary file when training vits
└── snapshot_iter_350000.pdz # model parameters and optimizer states
```
ps: This ckpt is not good enough, a better result is training
You can use the following scripts to synthesize for `${BIN_DIR}/../sentences.txt` using pretrained VITS.
```bash
source path.sh
add_blank=true
FLAGS_allocator_strategy=naive_best_fit \
FLAGS_fraction_of_gpu_memory_to_use=0.01 \
python3 ${BIN_DIR}/synthesize_e2e.py \
--config=vits_csmsc_ckpt_1.1.0/default.yaml \
--ckpt=vits_csmsc_ckpt_1.1.0/snapshot_iter_350000.pdz \
--phones_dict=vits_csmsc_ckpt_1.1.0/phone_id_map.txt \
--output_dir=exp/default/test_e2e \
--text=${BIN_DIR}/../sentences.txt \
--add-blank=${add_blank}
```

@ -15,4 +15,4 @@ if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
--phones_dict=dump/phone_id_map.txt \ --phones_dict=dump/phone_id_map.txt \
--test_metadata=dump/test/norm/metadata.jsonl \ --test_metadata=dump/test/norm/metadata.jsonl \
--output_dir=${train_output_path}/test --output_dir=${train_output_path}/test
fi fi

@ -3,6 +3,11 @@
config_path=$1 config_path=$1
train_output_path=$2 train_output_path=$2
# install monotonic_align
cd ${MAIN_ROOT}/paddlespeech/t2s/models/vits/monotonic_align
python3 setup.py build_ext --inplace
cd -
python3 ${BIN_DIR}/train.py \ python3 ${BIN_DIR}/train.py \
--train-metadata=dump/train/norm/metadata.jsonl \ --train-metadata=dump/train/norm/metadata.jsonl \
--dev-metadata=dump/dev/norm/metadata.jsonl \ --dev-metadata=dump/dev/norm/metadata.jsonl \

Loading…
Cancel
Save