Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleSpeech into add_new_tacotron2, test=tts
commit
1bf1a876ae
@ -1,2 +1,11 @@
|
||||
# Changelog
|
||||
|
||||
|
||||
Date: 2022-1-10, Author: Jackwaterveg.
|
||||
Add features to: CLI:
|
||||
- Support English (librispeech/asr1/transformer).
|
||||
- Support choosing `decode_method` for conformer and transformer models.
|
||||
- Refactor the config, using the unified config.
|
||||
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1297
|
||||
|
||||
***
|
||||
|
@ -1,40 +0,0 @@
|
||||
# Data Augmentation Pipeline
|
||||
|
||||
Data augmentation has often been a highly effective technique to boost deep learning performance. We augment our speech data by synthesizing new audios with small random perturbation (label-invariant transformation) added upon raw audios. You don't have to do the syntheses on your own, as it is already embedded into the data provider and is done on the fly, randomly for each epoch during training.
|
||||
|
||||
Six optional augmentation components are provided to be selected, configured, and inserted into the processing pipeline.
|
||||
|
||||
* Audio
|
||||
- Volume Perturbation
|
||||
- Speed Perturbation
|
||||
- Shifting Perturbation
|
||||
- Online Bayesian normalization
|
||||
- Noise Perturbation (need background noise audio files)
|
||||
- Impulse Response (need impulse audio files)
|
||||
|
||||
* Feature
|
||||
- SpecAugment
|
||||
- Adaptive SpecAugment
|
||||
|
||||
To inform the trainer of what augmentation components are needed and what their processing orders are, it is required to prepare in advance an *augmentation configuration file* in [JSON](http://www.json.org/) format. For example:
|
||||
|
||||
```
|
||||
[{
|
||||
"type": "speed",
|
||||
"params": {"min_speed_rate": 0.95,
|
||||
"max_speed_rate": 1.05},
|
||||
"prob": 0.6
|
||||
},
|
||||
{
|
||||
"type": "shift",
|
||||
"params": {"min_shift_ms": -5,
|
||||
"max_shift_ms": 5},
|
||||
"prob": 0.8
|
||||
}]
|
||||
```
|
||||
|
||||
When the `augment_conf_file` argument is set to the path of the above example configuration file, every audio clip in every epoch will be processed: with 60% of chance, it will first be speed perturbed with a uniformly random sampled speed-rate between 0.95 and 1.05, and then with 80% of chance it will be shifted in time with a randomly sampled offset between -5 ms and 5 ms. Finally, this newly synthesized audio clip will be fed into the feature extractor for further training.
|
||||
|
||||
For other configuration examples, please refer to `examples/conf/augmentation.example.json`.
|
||||
|
||||
Be careful when utilizing the data augmentation technique, as improper augmentation will harm the training, due to the enlarged train-test gap.
|
@ -0,0 +1,42 @@
|
||||
# TTS Papers
|
||||
## Text Frontend
|
||||
### Polyphone
|
||||
- [【g2pM】g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset](https://arxiv.org/abs/2004.03136)
|
||||
- [Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT](https://www1.se.cuhk.edu.hk/~hccl/publications/pub/201909_INTERSPEECH_DongyangDAI.pdf)
|
||||
### Text Normalization
|
||||
#### English
|
||||
- [applenob/text_normalization](https://github.com/applenob/text_normalization)
|
||||
### G2P
|
||||
#### English
|
||||
- [cmusphinx/g2p-seq2seq](https://github.com/cmusphinx/g2p-seq2seq)
|
||||
|
||||
## Acoustic Models
|
||||
- [【AdaSpeech3】AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style](https://arxiv.org/abs/2107.02530)
|
||||
- [【AdaSpeech2】AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data](https://arxiv.org/abs/2104.09715)
|
||||
- [【AdaSpeech】AdaSpeech: Adaptive Text to Speech for Custom Voice](https://arxiv.org/abs/2103.00993)
|
||||
- [【FastSpeech2】FastSpeech 2: Fast and High-Quality End-to-End Text to Speech](https://arxiv.org/abs/2006.04558)
|
||||
- [【FastPitch】FastPitch: Parallel Text-to-speech with Pitch Prediction](https://arxiv.org/abs/2006.06873)
|
||||
- [【SpeedySpeech】SpeedySpeech: Efficient Neural Speech Synthesis](https://arxiv.org/abs/2008.03802)
|
||||
- [【FastSpeech】FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263)
|
||||
- [【Transformer TTS】Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895)
|
||||
- [【Tacotron2】Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884)
|
||||
|
||||
## Vocoders
|
||||
- [【RefineGAN】RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses](https://arxiv.org/abs/2111.00962)
|
||||
- [【Fre-GAN】Fre-GAN: Adversarial Frequency-consistent Audio Synthesis](https://arxiv.org/abs/2106.02297)
|
||||
- [【StyleMelGAN】StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization](https://arxiv.org/abs/2011.01557)
|
||||
- [【Multi-band MelGAN】Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech](https://arxiv.org/abs/2005.05106)
|
||||
- [【HiFi-GAN】HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis](https://arxiv.org/abs/2010.05646)
|
||||
- [【VocGAN】VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network](https://arxiv.org/abs/2007.15256)
|
||||
- [【Parallel WaveGAN】Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram](https://arxiv.org/abs/1910.11480)
|
||||
- [【MelGAN】MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis](https://arxiv.org/abs/1910.06711)
|
||||
- [【WaveFlow】WaveFlow: A Compact Flow-based Model for Raw Audio](https://arxiv.org/abs/1912.01219)
|
||||
- [【LPCNet】LPCNet: Improving Neural Speech Synthesis Through Linear Prediction](https://arxiv.org/abs/1810.11846)
|
||||
- [【WaveRNN】Efficient Neural Audio Synthesis](https://arxiv.org/abs/1802.08435)
|
||||
## GAN TTS
|
||||
|
||||
- [【GAN TTS】High Fidelity Speech Synthesis with Adversarial Networks](https://arxiv.org/abs/1909.11646)
|
||||
|
||||
## Voice Cloning
|
||||
- [【SV2TTS】Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis](https://arxiv.org/abs/1806.04558)
|
||||
- [【GE2E】Generalized End-to-End Loss for Speaker Verification](https://arxiv.org/abs/1710.10467)
|
@ -0,0 +1,19 @@
|
||||
#!/bin/bash
|
||||
|
||||
train_output_path=$1
|
||||
|
||||
stage=0
|
||||
stop_stage=0
|
||||
|
||||
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
|
||||
python3 ${BIN_DIR}/../inference.py \
|
||||
--inference_dir=${train_output_path}/inference \
|
||||
--am=fastspeech2_aishell3 \
|
||||
--voc=pwgan_aishell3 \
|
||||
--text=${BIN_DIR}/../sentences.txt \
|
||||
--output_dir=${train_output_path}/pd_infer_out \
|
||||
--phones_dict=dump/phone_id_map.txt \
|
||||
--speaker_dict=dump/speaker_id_map.txt \
|
||||
--spk_id=0
|
||||
fi
|
||||
|
@ -0,0 +1,20 @@
|
||||
#!/bin/bash
|
||||
|
||||
train_output_path=$1
|
||||
|
||||
stage=0
|
||||
stop_stage=0
|
||||
|
||||
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
|
||||
python3 ${BIN_DIR}/../inference.py \
|
||||
--inference_dir=${train_output_path}/inference \
|
||||
--am=fastspeech2_vctk \
|
||||
--voc=pwgan_vctk \
|
||||
--text=${BIN_DIR}/../sentences_en.txt \
|
||||
--output_dir=${train_output_path}/pd_infer_out \
|
||||
--phones_dict=dump/phone_id_map.txt \
|
||||
--speaker_dict=dump/speaker_id_map.txt \
|
||||
--spk_id=0 \
|
||||
--lang=en
|
||||
fi
|
||||
|
@ -0,0 +1,246 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# generate mels using durations.txt
|
||||
# for mb melgan finetune
|
||||
# 长度和原本的 mel 不一致怎么办?
|
||||
import argparse
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import paddle
|
||||
import yaml
|
||||
from tqdm import tqdm
|
||||
from yacs.config import CfgNode
|
||||
|
||||
from paddlespeech.t2s.datasets.preprocess_utils import get_phn_dur
|
||||
from paddlespeech.t2s.datasets.preprocess_utils import merge_silence
|
||||
from paddlespeech.t2s.frontend.zh_frontend import Frontend
|
||||
from paddlespeech.t2s.models.speedyspeech import SpeedySpeech
|
||||
from paddlespeech.t2s.models.speedyspeech import SpeedySpeechInference
|
||||
from paddlespeech.t2s.modules.normalizer import ZScore
|
||||
|
||||
|
||||
def evaluate(args, speedyspeech_config):
|
||||
rootdir = Path(args.rootdir).expanduser()
|
||||
assert rootdir.is_dir()
|
||||
|
||||
# construct dataset for evaluation
|
||||
with open(args.phones_dict, "r") as f:
|
||||
phn_id = [line.strip().split() for line in f.readlines()]
|
||||
vocab_size = len(phn_id)
|
||||
print("vocab_size:", vocab_size)
|
||||
|
||||
phone_dict = {}
|
||||
for phn, id in phn_id:
|
||||
phone_dict[phn] = int(id)
|
||||
|
||||
with open(args.tones_dict, "r") as f:
|
||||
tone_id = [line.strip().split() for line in f.readlines()]
|
||||
tone_size = len(tone_id)
|
||||
print("tone_size:", tone_size)
|
||||
|
||||
frontend = Frontend(
|
||||
phone_vocab_path=args.phones_dict, tone_vocab_path=args.tones_dict)
|
||||
|
||||
if args.speaker_dict:
|
||||
with open(args.speaker_dict, 'rt') as f:
|
||||
spk_id_list = [line.strip().split() for line in f.readlines()]
|
||||
spk_num = len(spk_id_list)
|
||||
else:
|
||||
spk_num = None
|
||||
|
||||
model = SpeedySpeech(
|
||||
vocab_size=vocab_size,
|
||||
tone_size=tone_size,
|
||||
**speedyspeech_config["model"],
|
||||
spk_num=spk_num)
|
||||
|
||||
model.set_state_dict(
|
||||
paddle.load(args.speedyspeech_checkpoint)["main_params"])
|
||||
model.eval()
|
||||
|
||||
stat = np.load(args.speedyspeech_stat)
|
||||
mu, std = stat
|
||||
mu = paddle.to_tensor(mu)
|
||||
std = paddle.to_tensor(std)
|
||||
speedyspeech_normalizer = ZScore(mu, std)
|
||||
|
||||
speedyspeech_inference = SpeedySpeechInference(speedyspeech_normalizer,
|
||||
model)
|
||||
speedyspeech_inference.eval()
|
||||
|
||||
output_dir = Path(args.output_dir)
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
sentences, speaker_set = get_phn_dur(args.dur_file)
|
||||
merge_silence(sentences)
|
||||
|
||||
if args.dataset == "baker":
|
||||
wav_files = sorted(list((rootdir / "Wave").rglob("*.wav")))
|
||||
# split data into 3 sections
|
||||
num_train = 9800
|
||||
num_dev = 100
|
||||
train_wav_files = wav_files[:num_train]
|
||||
dev_wav_files = wav_files[num_train:num_train + num_dev]
|
||||
test_wav_files = wav_files[num_train + num_dev:]
|
||||
elif args.dataset == "aishell3":
|
||||
sub_num_dev = 5
|
||||
wav_dir = rootdir / "train" / "wav"
|
||||
train_wav_files = []
|
||||
dev_wav_files = []
|
||||
test_wav_files = []
|
||||
for speaker in os.listdir(wav_dir):
|
||||
wav_files = sorted(list((wav_dir / speaker).rglob("*.wav")))
|
||||
if len(wav_files) > 100:
|
||||
train_wav_files += wav_files[:-sub_num_dev * 2]
|
||||
dev_wav_files += wav_files[-sub_num_dev * 2:-sub_num_dev]
|
||||
test_wav_files += wav_files[-sub_num_dev:]
|
||||
else:
|
||||
train_wav_files += wav_files
|
||||
|
||||
train_wav_files = [
|
||||
os.path.basename(str(str_path)) for str_path in train_wav_files
|
||||
]
|
||||
dev_wav_files = [
|
||||
os.path.basename(str(str_path)) for str_path in dev_wav_files
|
||||
]
|
||||
test_wav_files = [
|
||||
os.path.basename(str(str_path)) for str_path in test_wav_files
|
||||
]
|
||||
|
||||
for i, utt_id in enumerate(tqdm(sentences)):
|
||||
phones = sentences[utt_id][0]
|
||||
durations = sentences[utt_id][1]
|
||||
speaker = sentences[utt_id][2]
|
||||
# 裁剪掉开头和结尾的 sil
|
||||
if args.cut_sil:
|
||||
if phones[0] == "sil" and len(durations) > 1:
|
||||
durations = durations[1:]
|
||||
phones = phones[1:]
|
||||
if phones[-1] == 'sil' and len(durations) > 1:
|
||||
durations = durations[:-1]
|
||||
phones = phones[:-1]
|
||||
|
||||
phones, tones = frontend._get_phone_tone(phones, get_tone_ids=True)
|
||||
if tones:
|
||||
tone_ids = frontend._t2id(tones)
|
||||
tone_ids = paddle.to_tensor(tone_ids)
|
||||
if phones:
|
||||
phone_ids = frontend._p2id(phones)
|
||||
phone_ids = paddle.to_tensor(phone_ids)
|
||||
|
||||
if args.speaker_dict:
|
||||
speaker_id = int(
|
||||
[item[1] for item in spk_id_list if speaker == item[0]][0])
|
||||
speaker_id = paddle.to_tensor(speaker_id)
|
||||
else:
|
||||
speaker_id = None
|
||||
|
||||
durations = paddle.to_tensor(np.array(durations))
|
||||
durations = paddle.unsqueeze(durations, axis=0)
|
||||
|
||||
# 生成的和真实的可能有 1, 2 帧的差距,但是 batch_fn 会修复
|
||||
# split data into 3 sections
|
||||
|
||||
wav_path = utt_id + ".wav"
|
||||
|
||||
if wav_path in train_wav_files:
|
||||
sub_output_dir = output_dir / ("train/raw")
|
||||
elif wav_path in dev_wav_files:
|
||||
sub_output_dir = output_dir / ("dev/raw")
|
||||
elif wav_path in test_wav_files:
|
||||
sub_output_dir = output_dir / ("test/raw")
|
||||
|
||||
sub_output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with paddle.no_grad():
|
||||
mel = speedyspeech_inference(
|
||||
phone_ids, tone_ids, durations=durations, spk_id=speaker_id)
|
||||
np.save(sub_output_dir / (utt_id + "_feats.npy"), mel)
|
||||
|
||||
|
||||
def main():
|
||||
# parse args and config and redirect to train_sp
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Synthesize with speedyspeech & parallel wavegan.")
|
||||
parser.add_argument(
|
||||
"--dataset",
|
||||
default="baker",
|
||||
type=str,
|
||||
help="name of dataset, should in {baker, ljspeech, vctk} now")
|
||||
parser.add_argument(
|
||||
"--rootdir", default=None, type=str, help="directory to dataset.")
|
||||
parser.add_argument(
|
||||
"--speedyspeech-config", type=str, help="speedyspeech config file.")
|
||||
parser.add_argument(
|
||||
"--speedyspeech-checkpoint",
|
||||
type=str,
|
||||
help="speedyspeech checkpoint to load.")
|
||||
parser.add_argument(
|
||||
"--speedyspeech-stat",
|
||||
type=str,
|
||||
help="mean and standard deviation used to normalize spectrogram when training speedyspeech."
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--phones-dict",
|
||||
type=str,
|
||||
default="phone_id_map.txt",
|
||||
help="phone vocabulary file.")
|
||||
parser.add_argument(
|
||||
"--tones-dict",
|
||||
type=str,
|
||||
default="tone_id_map.txt",
|
||||
help="tone vocabulary file.")
|
||||
parser.add_argument(
|
||||
"--speaker-dict", type=str, default=None, help="speaker id map file.")
|
||||
|
||||
parser.add_argument(
|
||||
"--dur-file", default=None, type=str, help="path to durations.txt.")
|
||||
parser.add_argument("--output-dir", type=str, help="output dir.")
|
||||
parser.add_argument(
|
||||
"--ngpu", type=int, default=1, help="if ngpu == 0, use cpu.")
|
||||
|
||||
def str2bool(str):
|
||||
return True if str.lower() == 'true' else False
|
||||
|
||||
parser.add_argument(
|
||||
"--cut-sil",
|
||||
type=str2bool,
|
||||
default=True,
|
||||
help="whether cut sil in the edge of audio")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.ngpu == 0:
|
||||
paddle.set_device("cpu")
|
||||
elif args.ngpu > 0:
|
||||
paddle.set_device("gpu")
|
||||
else:
|
||||
print("ngpu should >= 0 !")
|
||||
|
||||
with open(args.speedyspeech_config) as f:
|
||||
speedyspeech_config = CfgNode(yaml.safe_load(f))
|
||||
|
||||
print("========Args========")
|
||||
print(yaml.safe_dump(vars(args)))
|
||||
print("========Config========")
|
||||
print(speedyspeech_config)
|
||||
|
||||
evaluate(args, speedyspeech_config)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
@ -0,0 +1,409 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import math
|
||||
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
import paddle.nn.functional as F
|
||||
|
||||
|
||||
def length_to_mask(length, max_len=None, dtype=None):
|
||||
assert len(length.shape) == 1
|
||||
|
||||
if max_len is None:
|
||||
max_len = length.max().astype(
|
||||
'int').item() # using arange to generate mask
|
||||
mask = paddle.arange(
|
||||
max_len, dtype=length.dtype).expand(
|
||||
(len(length), max_len)) < length.unsqueeze(1)
|
||||
|
||||
if dtype is None:
|
||||
dtype = length.dtype
|
||||
|
||||
mask = paddle.to_tensor(mask, dtype=dtype)
|
||||
return mask
|
||||
|
||||
|
||||
class Conv1d(nn.Layer):
|
||||
def __init__(
|
||||
self,
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size,
|
||||
stride=1,
|
||||
padding="same",
|
||||
dilation=1,
|
||||
groups=1,
|
||||
bias=True,
|
||||
padding_mode="reflect", ):
|
||||
super().__init__()
|
||||
|
||||
self.kernel_size = kernel_size
|
||||
self.stride = stride
|
||||
self.dilation = dilation
|
||||
self.padding = padding
|
||||
self.padding_mode = padding_mode
|
||||
|
||||
self.conv = nn.Conv1D(
|
||||
in_channels,
|
||||
out_channels,
|
||||
self.kernel_size,
|
||||
stride=self.stride,
|
||||
padding=0,
|
||||
dilation=self.dilation,
|
||||
groups=groups,
|
||||
bias_attr=bias, )
|
||||
|
||||
def forward(self, x):
|
||||
if self.padding == "same":
|
||||
x = self._manage_padding(x, self.kernel_size, self.dilation,
|
||||
self.stride)
|
||||
else:
|
||||
raise ValueError("Padding must be 'same'. Got {self.padding}")
|
||||
|
||||
return self.conv(x)
|
||||
|
||||
def _manage_padding(self, x, kernel_size: int, dilation: int, stride: int):
|
||||
L_in = x.shape[-1] # Detecting input shape
|
||||
padding = self._get_padding_elem(L_in, stride, kernel_size,
|
||||
dilation) # Time padding
|
||||
x = F.pad(
|
||||
x, padding, mode=self.padding_mode,
|
||||
data_format="NCL") # Applying padding
|
||||
return x
|
||||
|
||||
def _get_padding_elem(self,
|
||||
L_in: int,
|
||||
stride: int,
|
||||
kernel_size: int,
|
||||
dilation: int):
|
||||
if stride > 1:
|
||||
n_steps = math.ceil(((L_in - kernel_size * dilation) / stride) + 1)
|
||||
L_out = stride * (n_steps - 1) + kernel_size * dilation
|
||||
padding = [kernel_size // 2, kernel_size // 2]
|
||||
else:
|
||||
L_out = (L_in - dilation * (kernel_size - 1) - 1) // stride + 1
|
||||
|
||||
padding = [(L_in - L_out) // 2, (L_in - L_out) // 2]
|
||||
|
||||
return padding
|
||||
|
||||
|
||||
class BatchNorm1d(nn.Layer):
|
||||
def __init__(
|
||||
self,
|
||||
input_size,
|
||||
eps=1e-05,
|
||||
momentum=0.9,
|
||||
weight_attr=None,
|
||||
bias_attr=None,
|
||||
data_format='NCL',
|
||||
use_global_stats=None, ):
|
||||
super().__init__()
|
||||
|
||||
self.norm = nn.BatchNorm1D(
|
||||
input_size,
|
||||
epsilon=eps,
|
||||
momentum=momentum,
|
||||
weight_attr=weight_attr,
|
||||
bias_attr=bias_attr,
|
||||
data_format=data_format,
|
||||
use_global_stats=use_global_stats, )
|
||||
|
||||
def forward(self, x):
|
||||
x_n = self.norm(x)
|
||||
return x_n
|
||||
|
||||
|
||||
class TDNNBlock(nn.Layer):
|
||||
def __init__(
|
||||
self,
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size,
|
||||
dilation,
|
||||
activation=nn.ReLU, ):
|
||||
super().__init__()
|
||||
self.conv = Conv1d(
|
||||
in_channels=in_channels,
|
||||
out_channels=out_channels,
|
||||
kernel_size=kernel_size,
|
||||
dilation=dilation, )
|
||||
self.activation = activation()
|
||||
self.norm = BatchNorm1d(input_size=out_channels)
|
||||
|
||||
def forward(self, x):
|
||||
return self.norm(self.activation(self.conv(x)))
|
||||
|
||||
|
||||
class Res2NetBlock(nn.Layer):
|
||||
def __init__(self, in_channels, out_channels, scale=8, dilation=1):
|
||||
super().__init__()
|
||||
assert in_channels % scale == 0
|
||||
assert out_channels % scale == 0
|
||||
|
||||
in_channel = in_channels // scale
|
||||
hidden_channel = out_channels // scale
|
||||
|
||||
self.blocks = nn.LayerList([
|
||||
TDNNBlock(
|
||||
in_channel, hidden_channel, kernel_size=3, dilation=dilation)
|
||||
for i in range(scale - 1)
|
||||
])
|
||||
self.scale = scale
|
||||
|
||||
def forward(self, x):
|
||||
y = []
|
||||
for i, x_i in enumerate(paddle.chunk(x, self.scale, axis=1)):
|
||||
if i == 0:
|
||||
y_i = x_i
|
||||
elif i == 1:
|
||||
y_i = self.blocks[i - 1](x_i)
|
||||
else:
|
||||
y_i = self.blocks[i - 1](x_i + y_i)
|
||||
y.append(y_i)
|
||||
y = paddle.concat(y, axis=1)
|
||||
return y
|
||||
|
||||
|
||||
class SEBlock(nn.Layer):
|
||||
def __init__(self, in_channels, se_channels, out_channels):
|
||||
super().__init__()
|
||||
|
||||
self.conv1 = Conv1d(
|
||||
in_channels=in_channels, out_channels=se_channels, kernel_size=1)
|
||||
self.relu = paddle.nn.ReLU()
|
||||
self.conv2 = Conv1d(
|
||||
in_channels=se_channels, out_channels=out_channels, kernel_size=1)
|
||||
self.sigmoid = paddle.nn.Sigmoid()
|
||||
|
||||
def forward(self, x, lengths=None):
|
||||
L = x.shape[-1]
|
||||
if lengths is not None:
|
||||
mask = length_to_mask(lengths * L, max_len=L)
|
||||
mask = mask.unsqueeze(1)
|
||||
total = mask.sum(axis=2, keepdim=True)
|
||||
s = (x * mask).sum(axis=2, keepdim=True) / total
|
||||
else:
|
||||
s = x.mean(axis=2, keepdim=True)
|
||||
|
||||
s = self.relu(self.conv1(s))
|
||||
s = self.sigmoid(self.conv2(s))
|
||||
|
||||
return s * x
|
||||
|
||||
|
||||
class AttentiveStatisticsPooling(nn.Layer):
|
||||
def __init__(self, channels, attention_channels=128, global_context=True):
|
||||
super().__init__()
|
||||
|
||||
self.eps = 1e-12
|
||||
self.global_context = global_context
|
||||
if global_context:
|
||||
self.tdnn = TDNNBlock(channels * 3, attention_channels, 1, 1)
|
||||
else:
|
||||
self.tdnn = TDNNBlock(channels, attention_channels, 1, 1)
|
||||
self.tanh = nn.Tanh()
|
||||
self.conv = Conv1d(
|
||||
in_channels=attention_channels,
|
||||
out_channels=channels,
|
||||
kernel_size=1)
|
||||
|
||||
def forward(self, x, lengths=None):
|
||||
C, L = x.shape[1], x.shape[2] # KP: (N, C, L)
|
||||
|
||||
def _compute_statistics(x, m, axis=2, eps=self.eps):
|
||||
mean = (m * x).sum(axis)
|
||||
std = paddle.sqrt(
|
||||
(m * (x - mean.unsqueeze(axis)).pow(2)).sum(axis).clip(eps))
|
||||
return mean, std
|
||||
|
||||
if lengths is None:
|
||||
lengths = paddle.ones([x.shape[0]])
|
||||
|
||||
# Make binary mask of shape [N, 1, L]
|
||||
mask = length_to_mask(lengths * L, max_len=L)
|
||||
mask = mask.unsqueeze(1)
|
||||
|
||||
# Expand the temporal context of the pooling layer by allowing the
|
||||
# self-attention to look at global properties of the utterance.
|
||||
if self.global_context:
|
||||
total = mask.sum(axis=2, keepdim=True).astype('float32')
|
||||
mean, std = _compute_statistics(x, mask / total)
|
||||
mean = mean.unsqueeze(2).tile((1, 1, L))
|
||||
std = std.unsqueeze(2).tile((1, 1, L))
|
||||
attn = paddle.concat([x, mean, std], axis=1)
|
||||
else:
|
||||
attn = x
|
||||
|
||||
# Apply layers
|
||||
attn = self.conv(self.tanh(self.tdnn(attn)))
|
||||
|
||||
# Filter out zero-paddings
|
||||
attn = paddle.where(
|
||||
mask.tile((1, C, 1)) == 0,
|
||||
paddle.ones_like(attn) * float("-inf"), attn)
|
||||
|
||||
attn = F.softmax(attn, axis=2)
|
||||
mean, std = _compute_statistics(x, attn)
|
||||
|
||||
# Append mean and std of the batch
|
||||
pooled_stats = paddle.concat((mean, std), axis=1)
|
||||
pooled_stats = pooled_stats.unsqueeze(2)
|
||||
|
||||
return pooled_stats
|
||||
|
||||
|
||||
class SERes2NetBlock(nn.Layer):
|
||||
def __init__(
|
||||
self,
|
||||
in_channels,
|
||||
out_channels,
|
||||
res2net_scale=8,
|
||||
se_channels=128,
|
||||
kernel_size=1,
|
||||
dilation=1,
|
||||
activation=nn.ReLU, ):
|
||||
super().__init__()
|
||||
self.out_channels = out_channels
|
||||
self.tdnn1 = TDNNBlock(
|
||||
in_channels,
|
||||
out_channels,
|
||||
kernel_size=1,
|
||||
dilation=1,
|
||||
activation=activation, )
|
||||
self.res2net_block = Res2NetBlock(out_channels, out_channels,
|
||||
res2net_scale, dilation)
|
||||
self.tdnn2 = TDNNBlock(
|
||||
out_channels,
|
||||
out_channels,
|
||||
kernel_size=1,
|
||||
dilation=1,
|
||||
activation=activation, )
|
||||
self.se_block = SEBlock(out_channels, se_channels, out_channels)
|
||||
|
||||
self.shortcut = None
|
||||
if in_channels != out_channels:
|
||||
self.shortcut = Conv1d(
|
||||
in_channels=in_channels,
|
||||
out_channels=out_channels,
|
||||
kernel_size=1, )
|
||||
|
||||
def forward(self, x, lengths=None):
|
||||
residual = x
|
||||
if self.shortcut:
|
||||
residual = self.shortcut(x)
|
||||
|
||||
x = self.tdnn1(x)
|
||||
x = self.res2net_block(x)
|
||||
x = self.tdnn2(x)
|
||||
x = self.se_block(x, lengths)
|
||||
|
||||
return x + residual
|
||||
|
||||
|
||||
class EcapaTdnn(nn.Layer):
|
||||
def __init__(
|
||||
self,
|
||||
input_size,
|
||||
lin_neurons=192,
|
||||
activation=nn.ReLU,
|
||||
channels=[512, 512, 512, 512, 1536],
|
||||
kernel_sizes=[5, 3, 3, 3, 1],
|
||||
dilations=[1, 2, 3, 4, 1],
|
||||
attention_channels=128,
|
||||
res2net_scale=8,
|
||||
se_channels=128,
|
||||
global_context=True, ):
|
||||
|
||||
super().__init__()
|
||||
assert len(channels) == len(kernel_sizes)
|
||||
assert len(channels) == len(dilations)
|
||||
self.channels = channels
|
||||
self.blocks = nn.LayerList()
|
||||
self.emb_size = lin_neurons
|
||||
|
||||
# The initial TDNN layer
|
||||
self.blocks.append(
|
||||
TDNNBlock(
|
||||
input_size,
|
||||
channels[0],
|
||||
kernel_sizes[0],
|
||||
dilations[0],
|
||||
activation, ))
|
||||
|
||||
# SE-Res2Net layers
|
||||
for i in range(1, len(channels) - 1):
|
||||
self.blocks.append(
|
||||
SERes2NetBlock(
|
||||
channels[i - 1],
|
||||
channels[i],
|
||||
res2net_scale=res2net_scale,
|
||||
se_channels=se_channels,
|
||||
kernel_size=kernel_sizes[i],
|
||||
dilation=dilations[i],
|
||||
activation=activation, ))
|
||||
|
||||
# Multi-layer feature aggregation
|
||||
self.mfa = TDNNBlock(
|
||||
channels[-1],
|
||||
channels[-1],
|
||||
kernel_sizes[-1],
|
||||
dilations[-1],
|
||||
activation, )
|
||||
|
||||
# Attentive Statistical Pooling
|
||||
self.asp = AttentiveStatisticsPooling(
|
||||
channels[-1],
|
||||
attention_channels=attention_channels,
|
||||
global_context=global_context, )
|
||||
self.asp_bn = BatchNorm1d(input_size=channels[-1] * 2)
|
||||
|
||||
# Final linear transformation
|
||||
self.fc = Conv1d(
|
||||
in_channels=channels[-1] * 2,
|
||||
out_channels=self.emb_size,
|
||||
kernel_size=1, )
|
||||
|
||||
def forward(self, x, lengths=None):
|
||||
"""
|
||||
Compute embeddings.
|
||||
|
||||
Args:
|
||||
x (paddle.Tensor): Input log-fbanks with shape (N, n_mels, T).
|
||||
lengths (paddle.Tensor, optional): Length proportions of batch length with shape (N). Defaults to None.
|
||||
|
||||
Returns:
|
||||
paddle.Tensor: Output embeddings with shape (N, self.emb_size, 1)
|
||||
"""
|
||||
xl = []
|
||||
for layer in self.blocks:
|
||||
try:
|
||||
x = layer(x, lengths=lengths)
|
||||
except TypeError:
|
||||
x = layer(x)
|
||||
xl.append(x)
|
||||
|
||||
# Multi-layer feature aggregation
|
||||
x = paddle.concat(xl[1:], axis=1)
|
||||
x = self.mfa(x)
|
||||
|
||||
# Attentive Statistical Pooling
|
||||
x = self.asp(x, lengths=lengths)
|
||||
x = self.asp_bn(x)
|
||||
|
||||
# Final linear transformation
|
||||
x = self.fc(x)
|
||||
|
||||
return x
|
@ -0,0 +1,77 @@
|
||||
cmake_minimum_required(VERSION 3.14 FATAL_ERROR)
|
||||
|
||||
project(deepspeech VERSION 0.1)
|
||||
|
||||
set(CMAKE_VERBOSE_MAKEFILE on)
|
||||
# set std-14
|
||||
set(CMAKE_CXX_STANDARD 14)
|
||||
|
||||
# include file
|
||||
include(FetchContent)
|
||||
include(ExternalProject)
|
||||
# fc_patch dir
|
||||
set(FETCHCONTENT_QUIET off)
|
||||
get_filename_component(fc_patch "fc_patch" REALPATH BASE_DIR "${CMAKE_SOURCE_DIR}")
|
||||
set(FETCHCONTENT_BASE_DIR ${fc_patch})
|
||||
|
||||
|
||||
###############################################################################
|
||||
# Option Configurations
|
||||
###############################################################################
|
||||
# option configurations
|
||||
option(TEST_DEBUG "option for debug" OFF)
|
||||
|
||||
|
||||
###############################################################################
|
||||
# Include third party
|
||||
###############################################################################
|
||||
# #example for include third party
|
||||
# FetchContent_Declare()
|
||||
# # FetchContent_MakeAvailable was not added until CMake 3.14
|
||||
# FetchContent_MakeAvailable()
|
||||
# include_directories()
|
||||
|
||||
# ABSEIL-CPP
|
||||
include(FetchContent)
|
||||
FetchContent_Declare(
|
||||
absl
|
||||
GIT_REPOSITORY "https://github.com/abseil/abseil-cpp.git"
|
||||
GIT_TAG "20210324.1"
|
||||
)
|
||||
FetchContent_MakeAvailable(absl)
|
||||
|
||||
# libsndfile
|
||||
include(FetchContent)
|
||||
FetchContent_Declare(
|
||||
libsndfile
|
||||
GIT_REPOSITORY "https://github.com/libsndfile/libsndfile.git"
|
||||
GIT_TAG "1.0.31"
|
||||
)
|
||||
FetchContent_MakeAvailable(libsndfile)
|
||||
|
||||
|
||||
###############################################################################
|
||||
# Add local library
|
||||
###############################################################################
|
||||
# system lib
|
||||
find_package()
|
||||
# if dir have CmakeLists.txt
|
||||
add_subdirectory()
|
||||
# if dir do not have CmakeLists.txt
|
||||
add_library(lib_name STATIC file.cc)
|
||||
target_link_libraries(lib_name item0 item1)
|
||||
add_dependencies(lib_name depend-target)
|
||||
|
||||
|
||||
###############################################################################
|
||||
# Library installation
|
||||
###############################################################################
|
||||
install()
|
||||
|
||||
|
||||
###############################################################################
|
||||
# Build binary file
|
||||
###############################################################################
|
||||
add_executable()
|
||||
target_link_libraries()
|
||||
|
@ -0,0 +1,2 @@
|
||||
aux_source_directory(. DIR_LIB_SRCS)
|
||||
add_library(decoder STATIC ${DIR_LIB_SRCS})
|
@ -0,0 +1,26 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
# Audio classification
|
||||
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
|
||||
paddlespeech cls --input ./cat.wav --topk 10
|
||||
|
||||
# Punctuation_restoration
|
||||
paddlespeech text --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
|
||||
|
||||
# Speech_recognition
|
||||
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
|
||||
paddlespeech asr --input ./zh.wav
|
||||
paddlespeech asr --model transformer_librispeech --lang en --input ./en.wav
|
||||
|
||||
# Text To Speech
|
||||
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!"
|
||||
paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"
|
||||
paddlespeech tts --voc mb_melgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"
|
||||
paddlespeech tts --voc style_melgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"
|
||||
paddlespeech tts --voc hifigan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!"
|
||||
paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0
|
||||
paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "hello world"
|
||||
paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "hello, boys" --lang en --spk_id 0
|
||||
|
||||
# Speech Translation (only support linux)
|
||||
paddlespeech st --input ./en.wav
|
@ -0,0 +1,201 @@
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
@ -0,0 +1,165 @@
|
||||
GNU LESSER GENERAL PUBLIC LICENSE
|
||||
Version 3, 29 June 2007
|
||||
|
||||
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
|
||||
This version of the GNU Lesser General Public License incorporates
|
||||
the terms and conditions of version 3 of the GNU General Public
|
||||
License, supplemented by the additional permissions listed below.
|
||||
|
||||
0. Additional Definitions.
|
||||
|
||||
As used herein, "this License" refers to version 3 of the GNU Lesser
|
||||
General Public License, and the "GNU GPL" refers to version 3 of the GNU
|
||||
General Public License.
|
||||
|
||||
"The Library" refers to a covered work governed by this License,
|
||||
other than an Application or a Combined Work as defined below.
|
||||
|
||||
An "Application" is any work that makes use of an interface provided
|
||||
by the Library, but which is not otherwise based on the Library.
|
||||
Defining a subclass of a class defined by the Library is deemed a mode
|
||||
of using an interface provided by the Library.
|
||||
|
||||
A "Combined Work" is a work produced by combining or linking an
|
||||
Application with the Library. The particular version of the Library
|
||||
with which the Combined Work was made is also called the "Linked
|
||||
Version".
|
||||
|
||||
The "Minimal Corresponding Source" for a Combined Work means the
|
||||
Corresponding Source for the Combined Work, excluding any source code
|
||||
for portions of the Combined Work that, considered in isolation, are
|
||||
based on the Application, and not on the Linked Version.
|
||||
|
||||
The "Corresponding Application Code" for a Combined Work means the
|
||||
object code and/or source code for the Application, including any data
|
||||
and utility programs needed for reproducing the Combined Work from the
|
||||
Application, but excluding the System Libraries of the Combined Work.
|
||||
|
||||
1. Exception to Section 3 of the GNU GPL.
|
||||
|
||||
You may convey a covered work under sections 3 and 4 of this License
|
||||
without being bound by section 3 of the GNU GPL.
|
||||
|
||||
2. Conveying Modified Versions.
|
||||
|
||||
If you modify a copy of the Library, and, in your modifications, a
|
||||
facility refers to a function or data to be supplied by an Application
|
||||
that uses the facility (other than as an argument passed when the
|
||||
facility is invoked), then you may convey a copy of the modified
|
||||
version:
|
||||
|
||||
a) under this License, provided that you make a good faith effort to
|
||||
ensure that, in the event an Application does not supply the
|
||||
function or data, the facility still operates, and performs
|
||||
whatever part of its purpose remains meaningful, or
|
||||
|
||||
b) under the GNU GPL, with none of the additional permissions of
|
||||
this License applicable to that copy.
|
||||
|
||||
3. Object Code Incorporating Material from Library Header Files.
|
||||
|
||||
The object code form of an Application may incorporate material from
|
||||
a header file that is part of the Library. You may convey such object
|
||||
code under terms of your choice, provided that, if the incorporated
|
||||
material is not limited to numerical parameters, data structure
|
||||
layouts and accessors, or small macros, inline functions and templates
|
||||
(ten or fewer lines in length), you do both of the following:
|
||||
|
||||
a) Give prominent notice with each copy of the object code that the
|
||||
Library is used in it and that the Library and its use are
|
||||
covered by this License.
|
||||
|
||||
b) Accompany the object code with a copy of the GNU GPL and this license
|
||||
document.
|
||||
|
||||
4. Combined Works.
|
||||
|
||||
You may convey a Combined Work under terms of your choice that,
|
||||
taken together, effectively do not restrict modification of the
|
||||
portions of the Library contained in the Combined Work and reverse
|
||||
engineering for debugging such modifications, if you also do each of
|
||||
the following:
|
||||
|
||||
a) Give prominent notice with each copy of the Combined Work that
|
||||
the Library is used in it and that the Library and its use are
|
||||
covered by this License.
|
||||
|
||||
b) Accompany the Combined Work with a copy of the GNU GPL and this license
|
||||
document.
|
||||
|
||||
c) For a Combined Work that displays copyright notices during
|
||||
execution, include the copyright notice for the Library among
|
||||
these notices, as well as a reference directing the user to the
|
||||
copies of the GNU GPL and this license document.
|
||||
|
||||
d) Do one of the following:
|
||||
|
||||
0) Convey the Minimal Corresponding Source under the terms of this
|
||||
License, and the Corresponding Application Code in a form
|
||||
suitable for, and under terms that permit, the user to
|
||||
recombine or relink the Application with a modified version of
|
||||
the Linked Version to produce a modified Combined Work, in the
|
||||
manner specified by section 6 of the GNU GPL for conveying
|
||||
Corresponding Source.
|
||||
|
||||
1) Use a suitable shared library mechanism for linking with the
|
||||
Library. A suitable mechanism is one that (a) uses at run time
|
||||
a copy of the Library already present on the user's computer
|
||||
system, and (b) will operate properly with a modified version
|
||||
of the Library that is interface-compatible with the Linked
|
||||
Version.
|
||||
|
||||
e) Provide Installation Information, but only if you would otherwise
|
||||
be required to provide such information under section 6 of the
|
||||
GNU GPL, and only to the extent that such information is
|
||||
necessary to install and execute a modified version of the
|
||||
Combined Work produced by recombining or relinking the
|
||||
Application with a modified version of the Linked Version. (If
|
||||
you use option 4d0, the Installation Information must accompany
|
||||
the Minimal Corresponding Source and Corresponding Application
|
||||
Code. If you use option 4d1, you must provide the Installation
|
||||
Information in the manner specified by section 6 of the GNU GPL
|
||||
for conveying Corresponding Source.)
|
||||
|
||||
5. Combined Libraries.
|
||||
|
||||
You may place library facilities that are a work based on the
|
||||
Library side by side in a single library together with other library
|
||||
facilities that are not Applications and are not covered by this
|
||||
License, and convey such a combined library under terms of your
|
||||
choice, if you do both of the following:
|
||||
|
||||
a) Accompany the combined library with a copy of the same work based
|
||||
on the Library, uncombined with any other library facilities,
|
||||
conveyed under the terms of this License.
|
||||
|
||||
b) Give prominent notice with the combined library that part of it
|
||||
is a work based on the Library, and explaining where to find the
|
||||
accompanying uncombined form of the same work.
|
||||
|
||||
6. Revised Versions of the GNU Lesser General Public License.
|
||||
|
||||
The Free Software Foundation may publish revised and/or new versions
|
||||
of the GNU Lesser General Public License from time to time. Such new
|
||||
versions will be similar in spirit to the present version, but may
|
||||
differ in detail to address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the
|
||||
Library as you received it specifies that a certain numbered version
|
||||
of the GNU Lesser General Public License "or any later version"
|
||||
applies to it, you have the option of following the terms and
|
||||
conditions either of that published version or of any later version
|
||||
published by the Free Software Foundation. If the Library as you
|
||||
received it does not specify a version number of the GNU Lesser
|
||||
General Public License, you may choose any version of the GNU Lesser
|
||||
General Public License ever published by the Free Software Foundation.
|
||||
|
||||
If the Library as you received it specifies that a proxy can decide
|
||||
whether future versions of the GNU Lesser General Public License shall
|
||||
apply, that proxy's public statement of acceptance of any version is
|
||||
permanent authorization for you to choose that version for the
|
||||
Library.
|
@ -0,0 +1,8 @@
|
||||
Most of the code here is licensed under the Apache License 2.0.
|
||||
There are exceptions that have their own licenses, listed below.
|
||||
|
||||
score.h and score.cpp is under the LGPL license.
|
||||
The two files include the header files from KenLM project.
|
||||
|
||||
For the rest:
|
||||
The default licence of paddlespeech-ctcdecoders is Apache License 2.0.
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -1,6 +1,6 @@
|
||||
// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// Licensed under the Apache License, Version 2.0 (the "COPYING.APACHE2.0");
|
||||
// you may not use this file except in compliance with the License.
|
||||
// You may obtain a copy of the License at
|
||||
//
|
@ -0,0 +1,183 @@
|
||||
#!/usr/bin/env python3
|
||||
# Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
|
||||
|
||||
'''
|
||||
Merge training configs into a single inference config.
|
||||
The single inference config is for CLI, which only takes a single config to do inferencing.
|
||||
The trainig configs includes: model config, preprocess config, decode config, vocab file and cmvn file.
|
||||
'''
|
||||
|
||||
import yaml
|
||||
import json
|
||||
import os
|
||||
import argparse
|
||||
import math
|
||||
from yacs.config import CfgNode
|
||||
|
||||
from paddlespeech.s2t.frontend.utility import load_dict
|
||||
from contextlib import redirect_stdout
|
||||
|
||||
|
||||
def save(save_path, config):
|
||||
with open(save_path, 'w') as fp:
|
||||
with redirect_stdout(fp):
|
||||
print(config.dump())
|
||||
|
||||
|
||||
def load(save_path):
|
||||
config = CfgNode(new_allowed=True)
|
||||
config.merge_from_file(save_path)
|
||||
return config
|
||||
|
||||
def load_json(json_path):
|
||||
with open(json_path) as f:
|
||||
json_content = json.load(f)
|
||||
return json_content
|
||||
|
||||
def remove_config_part(config, key_list):
|
||||
if len(key_list) == 0:
|
||||
return
|
||||
for i in range(len(key_list) -1):
|
||||
config = config[key_list[i]]
|
||||
config.pop(key_list[-1])
|
||||
|
||||
def load_cmvn_from_json(cmvn_stats):
|
||||
means = cmvn_stats['mean_stat']
|
||||
variance = cmvn_stats['var_stat']
|
||||
count = cmvn_stats['frame_num']
|
||||
for i in range(len(means)):
|
||||
means[i] /= count
|
||||
variance[i] = variance[i] / count - means[i] * means[i]
|
||||
if variance[i] < 1.0e-20:
|
||||
variance[i] = 1.0e-20
|
||||
variance[i] = 1.0 / math.sqrt(variance[i])
|
||||
cmvn_stats = {"mean":means, "istd":variance}
|
||||
return cmvn_stats
|
||||
|
||||
def merge_configs(
|
||||
conf_path = "conf/conformer.yaml",
|
||||
preprocess_path = "conf/preprocess.yaml",
|
||||
decode_path = "conf/tuning/decode.yaml",
|
||||
vocab_path = "data/vocab.txt",
|
||||
cmvn_path = "data/mean_std.json",
|
||||
save_path = "conf/conformer_infer.yaml",
|
||||
):
|
||||
|
||||
# Load the configs
|
||||
config = load(conf_path)
|
||||
decode_config = load(decode_path)
|
||||
vocab_list = load_dict(vocab_path)
|
||||
|
||||
# If use the kaldi feature, do not load the cmvn file
|
||||
if cmvn_path.split(".")[-1] == 'json':
|
||||
cmvn_stats = load_json(cmvn_path)
|
||||
if os.path.exists(preprocess_path):
|
||||
preprocess_config = load(preprocess_path)
|
||||
for idx, process in enumerate(preprocess_config["process"]):
|
||||
if process['type'] == "cmvn_json":
|
||||
preprocess_config["process"][idx][
|
||||
"cmvn_path"] = cmvn_stats
|
||||
break
|
||||
|
||||
config.preprocess_config = preprocess_config
|
||||
else:
|
||||
cmvn_stats = load_cmvn_from_json(cmvn_stats)
|
||||
config.mean_std_filepath = [{"cmvn_stats":cmvn_stats}]
|
||||
config.augmentation_config = ''
|
||||
# the cmvn file is end with .ark
|
||||
else:
|
||||
config.cmvn_path = cmvn_path
|
||||
# Updata the config
|
||||
config.vocab_filepath = vocab_list
|
||||
config.input_dim = config.feat_dim
|
||||
config.output_dim = len(config.vocab_filepath)
|
||||
config.decode = decode_config
|
||||
# Remove some parts of the config
|
||||
|
||||
if os.path.exists(preprocess_path):
|
||||
remove_train_list = ["train_manifest",
|
||||
"dev_manifest",
|
||||
"test_manifest",
|
||||
"n_epoch",
|
||||
"accum_grad",
|
||||
"global_grad_clip",
|
||||
"optim",
|
||||
"optim_conf",
|
||||
"scheduler",
|
||||
"scheduler_conf",
|
||||
"log_interval",
|
||||
"checkpoint",
|
||||
"shuffle_method",
|
||||
"weight_decay",
|
||||
"ctc_grad_norm_type",
|
||||
"minibatches",
|
||||
"subsampling_factor",
|
||||
"batch_bins",
|
||||
"batch_count",
|
||||
"batch_frames_in",
|
||||
"batch_frames_inout",
|
||||
"batch_frames_out",
|
||||
"sortagrad",
|
||||
"feat_dim",
|
||||
"stride_ms",
|
||||
"window_ms",
|
||||
"batch_size",
|
||||
"maxlen_in",
|
||||
"maxlen_out",
|
||||
]
|
||||
else:
|
||||
remove_train_list = ["train_manifest",
|
||||
"dev_manifest",
|
||||
"test_manifest",
|
||||
"n_epoch",
|
||||
"accum_grad",
|
||||
"global_grad_clip",
|
||||
"log_interval",
|
||||
"checkpoint",
|
||||
"lr",
|
||||
"lr_decay",
|
||||
"batch_size",
|
||||
"shuffle_method",
|
||||
"weight_decay",
|
||||
"sortagrad",
|
||||
"num_workers",
|
||||
]
|
||||
|
||||
for item in remove_train_list:
|
||||
try:
|
||||
remove_config_part(config, [item])
|
||||
except:
|
||||
print ( item + " " +"can not be removed")
|
||||
|
||||
# Save the config
|
||||
save(save_path, config)
|
||||
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(
|
||||
prog='Config merge', add_help=True)
|
||||
parser.add_argument(
|
||||
'--cfg_pth', type=str, default = 'conf/transformer.yaml', help='origin config file')
|
||||
parser.add_argument(
|
||||
'--pre_pth', type=str, default= "conf/preprocess.yaml", help='')
|
||||
parser.add_argument(
|
||||
'--dcd_pth', type=str, default= "conf/tuninig/decode.yaml", help='')
|
||||
parser.add_argument(
|
||||
'--vb_pth', type=str, default= "data/lang_char/vocab.txt", help='')
|
||||
parser.add_argument(
|
||||
'--cmvn_pth', type=str, default= "data/mean_std.json", help='')
|
||||
parser.add_argument(
|
||||
'--save_pth', type=str, default= "conf/transformer_infer.yaml", help='')
|
||||
parser_args = parser.parse_args()
|
||||
|
||||
merge_configs(
|
||||
conf_path = parser_args.cfg_pth,
|
||||
decode_path = parser_args.dcd_pth,
|
||||
preprocess_path = parser_args.pre_pth,
|
||||
vocab_path = parser_args.vb_pth,
|
||||
cmvn_path = parser_args.cmvn_pth,
|
||||
save_path = parser_args.save_pth,
|
||||
)
|
||||
|
||||
|
Loading…
Reference in new issue