From 2ce64f5de9e6b5e1f495bef779b56e0d49072622 Mon Sep 17 00:00:00 2001 From: TianYuan Date: Fri, 16 Sep 2022 06:44:12 +0000 Subject: [PATCH] fix ERNIE-SAT README, test=doc --- examples/aishell3/ernie_sat/README.md | 13 ++++++------- examples/aishell3_vctk/ernie_sat/README.md | 13 ++++++------- examples/vctk/ernie_sat/README.md | 11 +++++------ 3 files changed, 17 insertions(+), 20 deletions(-) diff --git a/examples/aishell3/ernie_sat/README.md b/examples/aishell3/ernie_sat/README.md index 707ee1381..eb867ab75 100644 --- a/examples/aishell3/ernie_sat/README.md +++ b/examples/aishell3/ernie_sat/README.md @@ -1,11 +1,10 @@ -# ERNIE-SAT with AISHELL3 dataset +# ERNIE-SAT with VCTK dataset +ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning. -ERNIE-SAT 是可以同时处理中英文的跨语言的语音-语言跨模态大模型,其在语音编辑、个性化语音合成以及跨语言的语音合成等多个任务取得了领先效果。可以应用于语音编辑、个性化合成、语音克隆、同传翻译等一系列场景,该项目供研究使用。 - -## 模型框架 -ERNIE-SAT 中我们提出了两项创新: -- 在预训练过程中将中英双语对应的音素作为输入,实现了跨语言、个性化的软音素映射 -- 采用语言和语音的联合掩码学习实现了语言和语音的对齐 +## Model Framework +In ERNIE-SAT, we propose two innovations: +- In the pretraining process, the phonemes corresponding to Chinese and English are used as input to achieve cross-language and personalized soft phoneme mapping +- The joint mask learning of speech and text is used to realize the alignment of speech and text

diff --git a/examples/aishell3_vctk/ernie_sat/README.md b/examples/aishell3_vctk/ernie_sat/README.md index a849488d5..d55af6756 100644 --- a/examples/aishell3_vctk/ernie_sat/README.md +++ b/examples/aishell3_vctk/ernie_sat/README.md @@ -1,11 +1,10 @@ -# ERNIE-SAT with AISHELL3 and VCTK dataset +# ERNIE-SAT with VCTK dataset +ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning. -ERNIE-SAT 是可以同时处理中英文的跨语言的语音-语言跨模态大模型,其在语音编辑、个性化语音合成以及跨语言的语音合成等多个任务取得了领先效果。可以应用于语音编辑、个性化合成、语音克隆、同传翻译等一系列场景,该项目供研究使用。 - -## 模型框架 -ERNIE-SAT 中我们提出了两项创新: -- 在预训练过程中将中英双语对应的音素作为输入,实现了跨语言、个性化的软音素映射 -- 采用语言和语音的联合掩码学习实现了语言和语音的对齐 +## Model Framework +In ERNIE-SAT, we propose two innovations: +- In the pretraining process, the phonemes corresponding to Chinese and English are used as input to achieve cross-language and personalized soft phoneme mapping +- The joint mask learning of speech and text is used to realize the alignment of speech and text

diff --git a/examples/vctk/ernie_sat/README.md b/examples/vctk/ernie_sat/README.md index 0a2f9359e..94c7ae25d 100644 --- a/examples/vctk/ernie_sat/README.md +++ b/examples/vctk/ernie_sat/README.md @@ -1,11 +1,10 @@ # ERNIE-SAT with VCTK dataset +ERNIE-SAT speech-text joint pretraining framework, which achieves SOTA results in cross-lingual multi-speaker speech synthesis and cross-lingual speech editing tasks, It can be applied to a series of scenarios such as Speech Editing, personalized Speech Synthesis, and Voice Cloning. -ERNIE-SAT 是可以同时处理中英文的跨语言的语音-语言跨模态大模型,其在语音编辑、个性化语音合成以及跨语言的语音合成等多个任务取得了领先效果。可以应用于语音编辑、个性化合成、语音克隆、同传翻译等一系列场景,该项目供研究使用。 - -## 模型框架 -ERNIE-SAT 中我们提出了两项创新: -- 在预训练过程中将中英双语对应的音素作为输入,实现了跨语言、个性化的软音素映射 -- 采用语言和语音的联合掩码学习实现了语言和语音的对齐 +## Model Framework +In ERNIE-SAT, we propose two innovations: +- In the pretraining process, the phonemes corresponding to Chinese and English are used as input to achieve cross-language and personalized soft phoneme mapping +- The joint mask learning of speech and text is used to realize the alignment of speech and text