From 7ec0ed4aafea4f01c874c2df91266b1ffa425fe6 Mon Sep 17 00:00:00 2001 From: Hui Zhang Date: Tue, 23 Nov 2021 07:20:44 +0000 Subject: [PATCH 1/3] kaldi feat dither when train --- docs/source/released_model.md | 16 ++++++++-------- .../s2t/frontend/featurizer/text_featurizer.py | 2 ++ paddlespeech/s2t/transform/spectrogram.py | 8 +++++--- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/docs/source/released_model.md b/docs/source/released_model.md index 78f5c92f..df9c3c5e 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -5,13 +5,13 @@ ### Acoustic Model Released in paddle 2.X Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech | example link :-------------:| :------------:| :-----: | -----: | :----------------- |:--------- | :---------- | :--------- | :----------- -[Ds2 Online Aishell S0 Model](https://deepspeech.bj.bcebos.com/release2.2/aishell/s0/ds2_online_aishll_CER8.02_release.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.080218 |-| 151 h | [D2 Online Aishell S0 Example](../../examples/aishell/s0) -[Ds2 Offline Aishell S0 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s0/aishell.s0.ds2.offline.cer6p65.release.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.065 |-| 151 h | [Ds2 Offline Aishell S0 Example](../../examples/aishell/s0) -[Conformer Online Aishell S1 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz) | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0594 |-| 151 h | [Conformer Online Aishell S1 Example](../../examples/aishell/s1) -[Conformer Offline Aishell S1 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz) | Aishell Dataset | Char-based | 284 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0547 |-| 151 h | [Conformer Offline Aishell S1 Example](../../examples/aishell/s1) -[Conformer Librispeech S1 Model](https://deepspeech.bj.bcebos.com/release2.1/librispeech/s1/conformer.release.tar.gz) | Librispeech Dataset | subword-based | 287 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0325 | 960 h | [Conformer Librispeech S1 example](../../example/librispeech/s1) -[Transformer Librispeech S1 Model](https://deepspeech.bj.bcebos.com/release2.2/librispeech/s1/librispeech.s1.transformer.all.wer5p62.release.tar.gz) | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0456 | 960 h | [Transformer Librispeech S1 example](../../example/librispeech/s1) -[Transformer Librispeech S2 Model](https://deepspeech.bj.bcebos.com/release2.2/librispeech/s2/libri_transformer_espnet_wer3p84.release.tar.gz) | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention |-| 0.0384 | 960 h | [Transformer Librispeech S2 example](../../example/librispeech/s2) +[Ds2 Online Aishell ASR0 Model](https://deepspeech.bj.bcebos.com/release2.2/aishell/s0/ds2_online_aishll_CER8.02_release.tar.gz) | Aishell Dataset | Char-based | 345 MB | 2 Conv + 5 LSTM layers with only forward direction | 0.080218 |-| 151 h | [D2 Online Aishell S0 Example](../../examples/aishell/asr0) +[Ds2 Offline Aishell ASR0 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s0/aishell.s0.ds2.offline.cer6p65.release.tar.gz)| Aishell Dataset | Char-based | 306 MB | 2 Conv + 3 bidirectional GRU layers| 0.065 |-| 151 h | [Ds2 Offline Aishell S0 Example](../../examples/aishell/asr0) +[Conformer Online Aishell ASR1 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.chunk.release.tar.gz) | Aishell Dataset | Char-based | 283 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0594 |-| 151 h | [Conformer Online Aishell S1 Example](../../examples/aishell/s1) +[Conformer Offline Aishell ASR1 Model](https://deepspeech.bj.bcebos.com/release2.1/aishell/s1/aishell.release.tar.gz) | Aishell Dataset | Char-based | 284 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring | 0.0547 |-| 151 h | [Conformer Offline Aishell S1 Example](../../examples/aishell/s1) +[Conformer Librispeech ASR1 Model](https://deepspeech.bj.bcebos.com/release2.1/librispeech/s1/conformer.release.tar.gz) | Librispeech Dataset | subword-based | 287 MB | Encoder:Conformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0325 | 960 h | [Conformer Librispeech S1 example](../../example/librispeech/s1) +[Transformer Librispeech ASR1 Model](https://deepspeech.bj.bcebos.com/release2.2/librispeech/s1/librispeech.s1.transformer.all.wer5p62.release.tar.gz) | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention rescoring |-| 0.0456 | 960 h | [Transformer Librispeech S1 example](../../example/librispeech/s1) +[Transformer Librispeech ASR2 Model](https://deepspeech.bj.bcebos.com/release2.2/librispeech/s2/libri_transformer_espnet_wer3p84.release.tar.gz) | Librispeech Dataset | subword-based | 131 MB | Encoder:Transformer, Decoder:Transformer, Decoding method: Attention |-| 0.0384 | 960 h | [Transformer Librispeech S2 example](../../example/librispeech/s2) ### Acoustic Model Transformed from paddle 1.8 Acoustic Model | Training Data | Token-based | Size | Descriptions | CER | WER | Hours of speech @@ -32,7 +32,7 @@ Language Model | Training Data | Token-based | Size | Descriptions ### Acoustic Models Model Type | Dataset| Example Link | Pretrained Models|Static Models|Siize(static) :-------------:| :------------:| :-----: | :-----:| :-----:| :-----: -Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)||| +Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/ttasr0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)||| TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.4.zip)||| SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)|[speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_static_0.5.zip)|12MB| FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_static_0.4.zip)|157MB| diff --git a/paddlespeech/s2t/frontend/featurizer/text_featurizer.py b/paddlespeech/s2t/frontend/featurizer/text_featurizer.py index 21f512e9..dab3d48d 100644 --- a/paddlespeech/s2t/frontend/featurizer/text_featurizer.py +++ b/paddlespeech/s2t/frontend/featurizer/text_featurizer.py @@ -56,6 +56,8 @@ class TextFeaturizer(): self.vocab_dict, self._id2token, self.vocab_list, self.unk_id, self.eos_id, self.blank_id = self._load_vocabulary_from_file( vocab_filepath, maskctc) self.vocab_size = len(self.vocab_list) + else: + logger.warning(f"TextFeaturizer: not have vocab file.") if unit_type == 'spm': spm_model = spm_model_prefix + '.model' diff --git a/paddlespeech/s2t/transform/spectrogram.py b/paddlespeech/s2t/transform/spectrogram.py index da91ef92..ea39a6f6 100644 --- a/paddlespeech/s2t/transform/spectrogram.py +++ b/paddlespeech/s2t/transform/spectrogram.py @@ -341,7 +341,7 @@ class LogMelSpectrogramKaldi(): self.eps = eps self.remove_dc_offset = True self.preemph = 0.97 - self.dither = dither + self.dither = dither # only work in train mode def __repr__(self): return ( @@ -361,11 +361,12 @@ class LogMelSpectrogramKaldi(): eps=self.eps, dither=self.dither, )) - def __call__(self, x): + def __call__(self, x, train): """ Args: x (np.ndarray): shape (Ti,) + train (bool): True, train mode. Raises: ValueError: not support (Ti, C) @@ -373,6 +374,7 @@ class LogMelSpectrogramKaldi(): Returns: np.ndarray: (T, D) """ + dither = self.dither if train else False if x.ndim != 1: raise ValueError("Not support x: [Time, Channel]") @@ -391,7 +393,7 @@ class LogMelSpectrogramKaldi(): nfft=self.n_fft, lowfreq=self.fmin, highfreq=self.fmax, - dither=self.dither, + dither=dither, remove_dc_offset=self.remove_dc_offset, preemph=self.preemph, wintype=self.window) From 56480e10336df0d89d11d9cb2ac057d8a347a981 Mon Sep 17 00:00:00 2001 From: Hui Zhang Date: Tue, 23 Nov 2021 07:26:37 +0000 Subject: [PATCH 2/3] fix format --- examples/aishell/asr1/README.md | 2 +- examples/dataset/aishell/aishell.py | 2 +- examples/dataset/ted_en_zh/ted_en_zh.py | 3 +-- examples/dataset/thchs30/thchs30.py | 2 +- .../timit/timit_kaldi_standard_split.py | 2 +- examples/librispeech/asr1/README.md | 2 +- examples/timit/README.md | 2 +- examples/wenetspeech/README.md | 2 +- examples/wenetspeech/asr1/RESULTS.md | 2 +- .../wenetspeech/asr1/local/extract_meta.py | 25 +++++++++++++------ .../wenetspeech/asr1/local/process_opus.py | 22 +++++++++++----- paddlespeech/s2t/exps/u2/model.py | 7 +----- .../frontend/featurizer/text_featurizer.py | 2 +- paddlespeech/s2t/transform/spectrogram.py | 2 +- 14 files changed, 46 insertions(+), 31 deletions(-) diff --git a/examples/aishell/asr1/README.md b/examples/aishell/asr1/README.md index 8c53f95f..da753634 100644 --- a/examples/aishell/asr1/README.md +++ b/examples/aishell/asr1/README.md @@ -28,4 +28,4 @@ Need set `decoding.decoding_chunk_size=16` when decoding. | transformer | 31.95M | conf/transformer.yaml | spec_aug | test | attention | 3.858648955821991 | 0.057293 | | transformer | 31.95M | conf/transformer.yaml | spec_aug | test | ctc_greedy_search | 3.858648955821991 | 0.061837 | | transformer | 31.95M | conf/transformer.yaml | spec_aug | test | ctc_prefix_beam_search | 3.858648955821991 | 0.061685 | -| transformer | 31.95M | conf/transformer.yaml | spec_aug | test | attention_rescoring | 3.858648955821991 | 0.053844 | \ No newline at end of file +| transformer | 31.95M | conf/transformer.yaml | spec_aug | test | attention_rescoring | 3.858648955821991 | 0.053844 | diff --git a/examples/dataset/aishell/aishell.py b/examples/dataset/aishell/aishell.py index 95ed0408..7431fc08 100644 --- a/examples/dataset/aishell/aishell.py +++ b/examples/dataset/aishell/aishell.py @@ -82,7 +82,7 @@ def create_manifest(data_dir, manifest_path_prefix): # if no transcription for audio then skipped if audio_id not in transcript_dict: continue - + utt2spk = Path(audio_path).parent.name audio_data, samplerate = soundfile.read(audio_path) duration = float(len(audio_data) / samplerate) diff --git a/examples/dataset/ted_en_zh/ted_en_zh.py b/examples/dataset/ted_en_zh/ted_en_zh.py index a8cbb837..9a3ba3b3 100644 --- a/examples/dataset/ted_en_zh/ted_en_zh.py +++ b/examples/dataset/ted_en_zh/ted_en_zh.py @@ -73,7 +73,6 @@ def create_manifest(data_dir, manifest_path_prefix): audio_data, samplerate = soundfile.read(audio_path) duration = float(len(audio_data) / samplerate) - translation_str = " ".join(translation.split()) trancription_str = " ".join(trancription.split()) json_lines.append( @@ -82,7 +81,7 @@ def create_manifest(data_dir, manifest_path_prefix): 'utt': utt, 'feat': audio_path, 'feat_shape': (duration, ), # second - 'text': [translation_str, trancription_str], + 'text': [translation_str, trancription_str], }, ensure_ascii=False)) diff --git a/examples/dataset/thchs30/thchs30.py b/examples/dataset/thchs30/thchs30.py index 2ec4ddab..cdfc0a75 100644 --- a/examples/dataset/thchs30/thchs30.py +++ b/examples/dataset/thchs30/thchs30.py @@ -124,7 +124,7 @@ def create_manifest(data_dir, manifest_path_prefix): json.dumps( { 'utt': audio_id, - 'utt2spk', spk, + 'utt2spk': spk, 'feat': audio_path, 'feat_shape': (duration, ), # second 'text': word_text, # charactor diff --git a/examples/dataset/timit/timit_kaldi_standard_split.py b/examples/dataset/timit/timit_kaldi_standard_split.py index 26aa76c7..473fc856 100644 --- a/examples/dataset/timit/timit_kaldi_standard_split.py +++ b/examples/dataset/timit/timit_kaldi_standard_split.py @@ -22,9 +22,9 @@ import argparse import codecs import json import os +from pathlib import Path import soundfile -from pathlib import Path parser = argparse.ArgumentParser(description=__doc__) parser.add_argument( diff --git a/examples/librispeech/asr1/README.md b/examples/librispeech/asr1/README.md index 20255db8..73f0863e 100644 --- a/examples/librispeech/asr1/README.md +++ b/examples/librispeech/asr1/README.md @@ -24,4 +24,4 @@ | transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention | 6.805267604192098, | 0.049795 | | transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_greedy_search | 6.805267604192098, | 0.054892 | | transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | ctc_prefix_beam_search | 6.805267604192098, | 0.054531 | -| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention_rescoring | 6.805267604192098, | 0.042244 | \ No newline at end of file +| transformer | 32.52 M | conf/transformer.yaml | spec_aug | test-clean | attention_rescoring | 6.805267604192098, | 0.042244 | diff --git a/examples/timit/README.md b/examples/timit/README.md index 77839874..51fcfd57 100644 --- a/examples/timit/README.md +++ b/examples/timit/README.md @@ -4,4 +4,4 @@ asr model with phone unit * asr0 - deepspeech2 Streaming/Non-Streaming * asr1 - transformer/conformer Streaming/Non-Streaming -* asr2 - transformer/conformer Streaming/Non-Streaming with Kaldi feature \ No newline at end of file +* asr2 - transformer/conformer Streaming/Non-Streaming with Kaldi feature diff --git a/examples/wenetspeech/README.md b/examples/wenetspeech/README.md index 0cb0f354..cbd01eb8 100644 --- a/examples/wenetspeech/README.md +++ b/examples/wenetspeech/README.md @@ -55,4 +55,4 @@ As shown in the following table, we provide 3 training subsets, namely `S`, `M` |-----------------|-------|--------------|-----------------------------------------------------------------------------------------| | DEV | 20 | Internet | Specially designed for some speech tools which require cross-validation set in training | | TEST\_NET | 23 | Internet | Match test | -| TEST\_MEETING | 15 | Real meeting | Mismatch test which is a far-field, conversational, spontaneous, and meeting dataset | \ No newline at end of file +| TEST\_MEETING | 15 | Real meeting | Mismatch test which is a far-field, conversational, spontaneous, and meeting dataset | diff --git a/examples/wenetspeech/asr1/RESULTS.md b/examples/wenetspeech/asr1/RESULTS.md index 5aff041f..5c2b8143 100644 --- a/examples/wenetspeech/asr1/RESULTS.md +++ b/examples/wenetspeech/asr1/RESULTS.md @@ -21,4 +21,4 @@ Pretrain model from http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/wen | conformer | 32.52 M | conf/conformer.yaml | spec_aug | aishell1 | attention | - | 0.048456 | | conformer | 32.52 M | conf/conformer.yaml | spec_aug | aishell1 | ctc_greedy_search | - | 0.052534 | | conformer | 32.52 M | conf/conformer.yaml | spec_aug | aishell1 | ctc_prefix_beam_search | - | 0.052915 | -| conformer | 32.52 M | conf/conformer.yaml | spec_aug | aishell1 | attention_rescoring | - | 0.047904 | \ No newline at end of file +| conformer | 32.52 M | conf/conformer.yaml | spec_aug | aishell1 | attention_rescoring | - | 0.047904 | diff --git a/examples/wenetspeech/asr1/local/extract_meta.py b/examples/wenetspeech/asr1/local/extract_meta.py index 4de0b7d4..0e1b2727 100644 --- a/examples/wenetspeech/asr1/local/extract_meta.py +++ b/examples/wenetspeech/asr1/local/extract_meta.py @@ -1,6 +1,18 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. # Copyright 2021 Xiaomi Corporation (Author: Yongqing Wang) # Mobvoi Inc(Author: Di Wu, Binbin Zhang) - # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -12,11 +24,10 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. - -import sys -import os import argparse import json +import os +import sys def get_args(): @@ -85,13 +96,13 @@ def meta_analysis(input_json, output_dir): else: utt2text.write(f'{sid}\t{text}\n') segments.write( - f'{sid}\t{aid}\t{start_time}\t{end_time}\n' - ) + f'{sid}\t{aid}\t{start_time}\t{end_time}\n') utt2dur.write(f'{sid}\t{dur}\n') segment_sub_names = " ".join(segment_subsets) utt2subsets.write( f'{sid}\t{segment_sub_names}\n') + def main(): args = get_args() @@ -99,4 +110,4 @@ def main(): if __name__ == '__main__': - main() \ No newline at end of file + main() diff --git a/examples/wenetspeech/asr1/local/process_opus.py b/examples/wenetspeech/asr1/local/process_opus.py index 603e0082..f1b9287e 100644 --- a/examples/wenetspeech/asr1/local/process_opus.py +++ b/examples/wenetspeech/asr1/local/process_opus.py @@ -1,5 +1,17 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. # Copyright 2021 NPU, ASLP Group (Author: Qijie Shao) - # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at @@ -11,14 +23,12 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. - # process_opus.py: segmentation and downsampling of opus audio - # usage: python3 process_opus.py wav.scp segments output_wav.scp +import os +import sys from pydub import AudioSegment -import sys -import os def read_file(wav_scp, segments): @@ -86,4 +96,4 @@ def main(): if __name__ == '__main__': - main() \ No newline at end of file + main() diff --git a/paddlespeech/s2t/exps/u2/model.py b/paddlespeech/s2t/exps/u2/model.py index 9f5448cc..27bc47d2 100644 --- a/paddlespeech/s2t/exps/u2/model.py +++ b/paddlespeech/s2t/exps/u2/model.py @@ -24,15 +24,10 @@ import jsonlines import numpy as np import paddle from paddle import distributed as dist -from paddle.io import DataLoader from yacs.config import CfgNode from paddlespeech.s2t.frontend.featurizer import TextFeaturizer -from paddlespeech.s2t.io.collator import SpeechCollator from paddlespeech.s2t.io.dataloader import BatchDataLoader -from paddlespeech.s2t.io.dataset import ManifestDataset -from paddlespeech.s2t.io.sampler import SortagradBatchSampler -from paddlespeech.s2t.io.sampler import SortagradDistributedBatchSampler from paddlespeech.s2t.models.u2 import U2Model from paddlespeech.s2t.training.optimizer import OptimizerFactory from paddlespeech.s2t.training.reporter import ObsScope @@ -215,7 +210,7 @@ class U2Trainer(Trainer): msg += f"{v:>.8f}" if isinstance(v, float) else f"{v}" msg += f" {k.split(',')[1]}" if len( - k.split(',')) == 2 else f"" + k.split(',')) == 2 else "" msg += "," msg = msg[:-1] # remove the last "," if (batch_index + 1 diff --git a/paddlespeech/s2t/frontend/featurizer/text_featurizer.py b/paddlespeech/s2t/frontend/featurizer/text_featurizer.py index dab3d48d..812be6e4 100644 --- a/paddlespeech/s2t/frontend/featurizer/text_featurizer.py +++ b/paddlespeech/s2t/frontend/featurizer/text_featurizer.py @@ -57,7 +57,7 @@ class TextFeaturizer(): vocab_filepath, maskctc) self.vocab_size = len(self.vocab_list) else: - logger.warning(f"TextFeaturizer: not have vocab file.") + logger.warning("TextFeaturizer: not have vocab file.") if unit_type == 'spm': spm_model = spm_model_prefix + '.model' diff --git a/paddlespeech/s2t/transform/spectrogram.py b/paddlespeech/s2t/transform/spectrogram.py index ea39a6f6..f35adef0 100644 --- a/paddlespeech/s2t/transform/spectrogram.py +++ b/paddlespeech/s2t/transform/spectrogram.py @@ -341,7 +341,7 @@ class LogMelSpectrogramKaldi(): self.eps = eps self.remove_dc_offset = True self.preemph = 0.97 - self.dither = dither # only work in train mode + self.dither = dither # only work in train mode def __repr__(self): return ( From 7a25ee26d970f0185ddf351fcc62b749805a5595 Mon Sep 17 00:00:00 2001 From: Hui Zhang Date: Tue, 23 Nov 2021 07:28:54 +0000 Subject: [PATCH 3/3] fix release model egs name --- docs/source/released_model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/released_model.md b/docs/source/released_model.md index df9c3c5e..5abaf46d 100644 --- a/docs/source/released_model.md +++ b/docs/source/released_model.md @@ -32,7 +32,7 @@ Language Model | Training Data | Token-based | Size | Descriptions ### Acoustic Models Model Type | Dataset| Example Link | Pretrained Models|Static Models|Siize(static) :-------------:| :------------:| :-----: | :-----:| :-----:| :-----: -Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/ttasr0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)||| +Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/tacotron2_ljspeech_ckpt_0.3.zip)||| TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/transformer_tts_ljspeech_ckpt_0.4.zip)||| SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_ckpt_0.5.zip)|[speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/speedyspeech_nosil_baker_static_0.5.zip)|12MB| FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/fastspeech2_nosil_baker_static_0.4.zip)|157MB|