Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleSpeech into fix_ci_waveflow
commit
f51097618b
@ -0,0 +1,13 @@
|
||||
Demo Video
|
||||
==================
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<video controls width="1024">
|
||||
|
||||
<source src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/PaddleSpeech_Demo.mp4"
|
||||
type="video/mp4">
|
||||
|
||||
Sorry, your browser doesn't support embedded videos.
|
||||
</video>
|
||||
|
@ -0,0 +1,12 @@
|
||||
TTS Demo Video
|
||||
==================
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<video controls width="1024">
|
||||
|
||||
<source src="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/paddle2021_with_me.mp4"
|
||||
type="video/mp4">
|
||||
Sorry, your browser doesn't support embedded videos.
|
||||
</video>
|
||||
|
@ -0,0 +1,20 @@
|
||||
# Callcenter 8k sample rate
|
||||
|
||||
Data distribution:
|
||||
|
||||
```
|
||||
676048 utts
|
||||
491.4004722221223 h
|
||||
4357792.0 text
|
||||
2.4633630739178654 text/sec
|
||||
2.6167397877068495 sec/utt
|
||||
```
|
||||
|
||||
train/dev/test partition:
|
||||
|
||||
```
|
||||
33802 manifest.dev
|
||||
67606 manifest.test
|
||||
574640 manifest.train
|
||||
676048 total
|
||||
```
|
@ -0,0 +1,127 @@
|
||||
# WaveRNN with CSMSC
|
||||
This example contains code used to train a [WaveRNN](https://arxiv.org/abs/1802.08435) model with [Chinese Standard Mandarin Speech Copus](https://www.data-baker.com/open_source.html).
|
||||
## Dataset
|
||||
### Download and Extract
|
||||
Download CSMSC from the [official website](https://www.data-baker.com/data/index/source) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/BZNSYP`.
|
||||
|
||||
### Get MFA Result and Extract
|
||||
We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) results to cut silence at the edge of audio.
|
||||
You can download from here [baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) of our repo.
|
||||
|
||||
## Get Started
|
||||
Assume the path to the dataset is `~/datasets/BZNSYP`.
|
||||
Assume the path to the MFA result of CSMSC is `./baker_alignment_tone`.
|
||||
Run the command below to
|
||||
1. **source path**.
|
||||
2. preprocess the dataset.
|
||||
3. train the model.
|
||||
4. synthesize wavs.
|
||||
- synthesize waveform from `metadata.jsonl`.
|
||||
```bash
|
||||
./run.sh
|
||||
```
|
||||
You can choose a range of stages you want to run, or set `stage` equal to `stop-stage` to use only one stage, for example, running the following command will only preprocess the dataset.
|
||||
```bash
|
||||
./run.sh --stage 0 --stop-stage 0
|
||||
```
|
||||
### Data Preprocessing
|
||||
```bash
|
||||
./local/preprocess.sh ${conf_path}
|
||||
```
|
||||
When it is done. A `dump` folder is created in the current directory. The structure of the dump folder is listed below.
|
||||
|
||||
```text
|
||||
dump
|
||||
├── dev
|
||||
│ ├── norm
|
||||
│ └── raw
|
||||
├── test
|
||||
│ ├── norm
|
||||
│ └── raw
|
||||
└── train
|
||||
├── norm
|
||||
├── raw
|
||||
└── feats_stats.npy
|
||||
```
|
||||
The dataset is split into 3 parts, namely `train`, `dev`, and `test`, each of which contains a `norm` and `raw` subfolder. The `raw` folder contains the log magnitude of the mel spectrogram of each utterance, while the norm folder contains the normalized spectrogram. The statistics used to normalize the spectrogram are computed from the training set, which is located in `dump/train/feats_stats.npy`.
|
||||
|
||||
Also, there is a `metadata.jsonl` in each subfolder. It is a table-like file that contains id and paths to the spectrogram of each utterance.
|
||||
|
||||
### Model Training
|
||||
```bash
|
||||
CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${conf_path} ${train_output_path}
|
||||
```
|
||||
`./local/train.sh` calls `${BIN_DIR}/train.py`.
|
||||
Here's the complete help message.
|
||||
|
||||
```text
|
||||
usage: train.py [-h] [--config CONFIG] [--train-metadata TRAIN_METADATA]
|
||||
[--dev-metadata DEV_METADATA] [--output-dir OUTPUT_DIR]
|
||||
[--ngpu NGPU]
|
||||
|
||||
Train a WaveRNN model.
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--config CONFIG config file to overwrite default config.
|
||||
--train-metadata TRAIN_METADATA
|
||||
training data.
|
||||
--dev-metadata DEV_METADATA
|
||||
dev data.
|
||||
--output-dir OUTPUT_DIR
|
||||
output dir.
|
||||
--ngpu NGPU if ngpu == 0, use cpu.
|
||||
```
|
||||
|
||||
1. `--config` is a config file in yaml format to overwrite the default config, which can be found at `conf/default.yaml`.
|
||||
2. `--train-metadata` and `--dev-metadata` should be the metadata file in the normalized subfolder of `train` and `dev` in the `dump` folder.
|
||||
3. `--output-dir` is the directory to save the results of the experiment. Checkpoints are saved in `checkpoints/` inside this directory.
|
||||
4. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
|
||||
|
||||
### Synthesizing
|
||||
`./local/synthesize.sh` calls `${BIN_DIR}/synthesize.py`, which can synthesize waveform from `metadata.jsonl`.
|
||||
```bash
|
||||
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${conf_path} ${train_output_path} ${ckpt_name}
|
||||
```
|
||||
```text
|
||||
usage: synthesize.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]
|
||||
[--test-metadata TEST_METADATA] [--output-dir OUTPUT_DIR]
|
||||
[--ngpu NGPU]
|
||||
|
||||
Synthesize with WaveRNN.
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--config CONFIG Vocoder config file.
|
||||
--checkpoint CHECKPOINT
|
||||
snapshot to load.
|
||||
--test-metadata TEST_METADATA
|
||||
dev data.
|
||||
--output-dir OUTPUT_DIR
|
||||
output dir.
|
||||
--ngpu NGPU if ngpu == 0, use cpu.
|
||||
```
|
||||
|
||||
1. `--config` wavernn config file. You should use the same config with which the model is trained.
|
||||
2. `--checkpoint` is the checkpoint to load. Pick one of the checkpoints from `checkpoints` inside the training output directory.
|
||||
3. `--test-metadata` is the metadata of the test dataset. Use the `metadata.jsonl` in the `dev/norm` subfolder from the processed directory.
|
||||
4. `--output-dir` is the directory to save the synthesized audio files.
|
||||
5. `--ngpu` is the number of gpus to use, if ngpu == 0, use cpu.
|
||||
|
||||
## Pretrained Models
|
||||
The pretrained model can be downloaded here [wavernn_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_ckpt_0.2.0.zip).
|
||||
|
||||
The static model can be downloaded here [wavernn_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/wavernn/wavernn_csmsc_static_0.2.0.zip).
|
||||
|
||||
Model | Step | eval/loss
|
||||
:-------------:|:------------:| :------------:
|
||||
default| 1(gpu) x 400000|2.602768
|
||||
|
||||
WaveRNN checkpoint contains files listed below.
|
||||
|
||||
```text
|
||||
wavernn_csmsc_ckpt_0.2.0
|
||||
├── default.yaml # default config used to train wavernn
|
||||
├── feats_stats.npy # statistics used to normalize spectrogram when training wavernn
|
||||
└── snapshot_iter_400000.pdz # parameters of wavernn
|
||||
```
|
@ -1,17 +0,0 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""t2s's infrastructure for data processing.
|
||||
"""
|
||||
from .batch import *
|
||||
from .dataset import *
|
@ -1,261 +0,0 @@
|
||||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import six
|
||||
from paddle.io import Dataset
|
||||
|
||||
__all__ = [
|
||||
"split",
|
||||
"TransformDataset",
|
||||
"CacheDataset",
|
||||
"TupleDataset",
|
||||
"DictDataset",
|
||||
"SliceDataset",
|
||||
"SubsetDataset",
|
||||
"FilterDataset",
|
||||
"ChainDataset",
|
||||
]
|
||||
|
||||
|
||||
def split(dataset, first_size):
|
||||
"""A utility function to split a dataset into two datasets."""
|
||||
first = SliceDataset(dataset, 0, first_size)
|
||||
second = SliceDataset(dataset, first_size, len(dataset))
|
||||
return first, second
|
||||
|
||||
|
||||
class TransformDataset(Dataset):
|
||||
def __init__(self, dataset, transform):
|
||||
"""Dataset which is transformed from another with a transform.
|
||||
|
||||
Args:
|
||||
dataset (Dataset): the base dataset.
|
||||
transform (callable): the transform which takes an example of the base dataset as parameter and return a new example.
|
||||
"""
|
||||
self._dataset = dataset
|
||||
self._transform = transform
|
||||
|
||||
def __len__(self):
|
||||
return len(self._dataset)
|
||||
|
||||
def __getitem__(self, i):
|
||||
in_data = self._dataset[i]
|
||||
return self._transform(in_data)
|
||||
|
||||
|
||||
class CacheDataset(Dataset):
|
||||
def __init__(self, dataset):
|
||||
"""A lazy cache of the base dataset.
|
||||
|
||||
Args:
|
||||
dataset (Dataset): the base dataset to cache.
|
||||
"""
|
||||
self._dataset = dataset
|
||||
self._cache = dict()
|
||||
|
||||
def __len__(self):
|
||||
return len(self._dataset)
|
||||
|
||||
def __getitem__(self, i):
|
||||
if i not in self._cache:
|
||||
self._cache[i] = self._dataset[i]
|
||||
return self._cache[i]
|
||||
|
||||
|
||||
class TupleDataset(Dataset):
|
||||
def __init__(self, *datasets):
|
||||
"""A compound dataset made from several datasets of the same length. An example of the `TupleDataset` is a tuple of examples from the constituent datasets.
|
||||
|
||||
Args:
|
||||
datasets: tuple[Dataset], the constituent datasets.
|
||||
"""
|
||||
if not datasets:
|
||||
raise ValueError("no datasets are given")
|
||||
length = len(datasets[0])
|
||||
for i, dataset in enumerate(datasets):
|
||||
if len(dataset) != length:
|
||||
raise ValueError("all the datasets should have the same length."
|
||||
"dataset {} has a different length".format(i))
|
||||
self._datasets = datasets
|
||||
self._length = length
|
||||
|
||||
def __getitem__(self, index):
|
||||
# SOA
|
||||
batches = [dataset[index] for dataset in self._datasets]
|
||||
if isinstance(index, slice):
|
||||
length = len(batches[0])
|
||||
# AOS
|
||||
return [
|
||||
tuple([batch[i] for batch in batches])
|
||||
for i in six.moves.range(length)
|
||||
]
|
||||
else:
|
||||
return tuple(batches)
|
||||
|
||||
def __len__(self):
|
||||
return self._length
|
||||
|
||||
|
||||
class DictDataset(Dataset):
|
||||
def __init__(self, **datasets):
|
||||
"""
|
||||
A compound dataset made from several datasets of the same length. An
|
||||
example of the `DictDataset` is a dict of examples from the constituent
|
||||
datasets.
|
||||
|
||||
WARNING: paddle does not have a good support for DictDataset, because
|
||||
every batch yield from a DataLoader is a list, but it cannot be a dict.
|
||||
So you have to provide a collate function because you cannot use the
|
||||
default one.
|
||||
|
||||
Args:
|
||||
datasets: Dict[Dataset], the constituent datasets.
|
||||
"""
|
||||
if not datasets:
|
||||
raise ValueError("no datasets are given")
|
||||
length = None
|
||||
for key, dataset in six.iteritems(datasets):
|
||||
if length is None:
|
||||
length = len(dataset)
|
||||
elif len(dataset) != length:
|
||||
raise ValueError(
|
||||
"all the datasets should have the same length."
|
||||
"dataset {} has a different length".format(key))
|
||||
self._datasets = datasets
|
||||
self._length = length
|
||||
|
||||
def __getitem__(self, index):
|
||||
batches = {
|
||||
key: dataset[index]
|
||||
for key, dataset in six.iteritems(self._datasets)
|
||||
}
|
||||
if isinstance(index, slice):
|
||||
length = len(six.next(six.itervalues(batches)))
|
||||
return [{key: batch[i]
|
||||
for key, batch in six.iteritems(batches)}
|
||||
for i in six.moves.range(length)]
|
||||
else:
|
||||
return batches
|
||||
|
||||
def __len__(self):
|
||||
return self._length
|
||||
|
||||
|
||||
class SliceDataset(Dataset):
|
||||
def __init__(self, dataset, start, finish, order=None):
|
||||
"""A Dataset which is a slice of the base dataset.
|
||||
|
||||
Args:
|
||||
dataset (Dataset): the base dataset.
|
||||
start (int): the start of the slice.
|
||||
finish (int): the end of the slice, not inclusive.
|
||||
order (List[int], optional): the order, it is a permutation of the valid example ids of the base dataset. If `order` is provided, the slice is taken in `order`. Defaults to None.
|
||||
"""
|
||||
if start < 0 or finish > len(dataset):
|
||||
raise ValueError("subset overruns the dataset.")
|
||||
self._dataset = dataset
|
||||
self._start = start
|
||||
self._finish = finish
|
||||
self._size = finish - start
|
||||
|
||||
if order is not None and len(order) != len(dataset):
|
||||
raise ValueError(
|
||||
"order should have the same length as the dataset"
|
||||
"len(order) = {} which does not euqals len(dataset) = {} ".
|
||||
format(len(order), len(dataset)))
|
||||
self._order = order
|
||||
|
||||
def __len__(self):
|
||||
return self._size
|
||||
|
||||
def __getitem__(self, i):
|
||||
if i >= 0:
|
||||
if i >= self._size:
|
||||
raise IndexError('dataset index out of range')
|
||||
index = self._start + i
|
||||
else:
|
||||
if i < -self._size:
|
||||
raise IndexError('dataset index out of range')
|
||||
index = self._finish + i
|
||||
|
||||
if self._order is not None:
|
||||
index = self._order[index]
|
||||
return self._dataset[index]
|
||||
|
||||
|
||||
class SubsetDataset(Dataset):
|
||||
def __init__(self, dataset, indices):
|
||||
"""A Dataset which is a subset of the base dataset.
|
||||
|
||||
Args:
|
||||
dataset (Dataset): the base dataset.
|
||||
indices (Iterable[int]): the indices of the examples to pick.
|
||||
"""
|
||||
self._dataset = dataset
|
||||
if len(indices) > len(dataset):
|
||||
raise ValueError("subset's size larger that dataset's size!")
|
||||
self._indices = indices
|
||||
self._size = len(indices)
|
||||
|
||||
def __len__(self):
|
||||
return self._size
|
||||
|
||||
def __getitem__(self, i):
|
||||
index = self._indices[i]
|
||||
return self._dataset[index]
|
||||
|
||||
|
||||
class FilterDataset(Dataset):
|
||||
def __init__(self, dataset, filter_fn):
|
||||
"""A filtered dataset.
|
||||
|
||||
Args:
|
||||
dataset (Dataset): the base dataset.
|
||||
filter_fn (callable): a callable which takes an example of the base dataset and return a boolean.
|
||||
"""
|
||||
self._dataset = dataset
|
||||
self._indices = [
|
||||
i for i in range(len(dataset)) if filter_fn(dataset[i])
|
||||
]
|
||||
self._size = len(self._indices)
|
||||
|
||||
def __len__(self):
|
||||
return self._size
|
||||
|
||||
def __getitem__(self, i):
|
||||
index = self._indices[i]
|
||||
return self._dataset[index]
|
||||
|
||||
|
||||
class ChainDataset(Dataset):
|
||||
def __init__(self, *datasets):
|
||||
"""A concatenation of the several datasets which the same structure.
|
||||
|
||||
Args:
|
||||
datasets (Iterable[Dataset]): datasets to concat.
|
||||
"""
|
||||
self._datasets = datasets
|
||||
|
||||
def __len__(self):
|
||||
return sum(len(dataset) for dataset in self._datasets)
|
||||
|
||||
def __getitem__(self, i):
|
||||
if i < 0:
|
||||
raise IndexError("ChainDataset doesnot support negative indexing.")
|
||||
|
||||
for dataset in self._datasets:
|
||||
if i < len(dataset):
|
||||
return dataset[i]
|
||||
i -= len(dataset)
|
||||
|
||||
raise IndexError("dataset index out of range")
|
@ -1,92 +0,0 @@
|
||||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
import librosa
|
||||
import numpy as np
|
||||
from paddle.io import Dataset
|
||||
|
||||
__all__ = ["AudioSegmentDataset", "AudioDataset", "AudioFolderDataset"]
|
||||
|
||||
|
||||
class AudioSegmentDataset(Dataset):
|
||||
"""A simple dataset adaptor for audio files to train vocoders.
|
||||
Read -> trim silence -> normalize -> extract a segment
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
file_paths: List[Path],
|
||||
sample_rate: int,
|
||||
length: int,
|
||||
top_db: float):
|
||||
self.file_paths = file_paths
|
||||
self.sr = sample_rate
|
||||
self.top_db = top_db
|
||||
self.length = length # samples in the clip
|
||||
|
||||
def __getitem__(self, i):
|
||||
fpath = self.file_paths[i]
|
||||
y, sr = librosa.load(fpath, sr=self.sr)
|
||||
y, _ = librosa.effects.trim(y, top_db=self.top_db)
|
||||
y = librosa.util.normalize(y)
|
||||
y = y.astype(np.float32)
|
||||
|
||||
# pad or trim
|
||||
if y.size <= self.length:
|
||||
y = np.pad(y, [0, self.length - len(y)], mode='constant')
|
||||
else:
|
||||
start = np.random.randint(0, 1 + len(y) - self.length)
|
||||
y = y[start:start + self.length]
|
||||
return y
|
||||
|
||||
def __len__(self):
|
||||
return len(self.file_paths)
|
||||
|
||||
|
||||
class AudioDataset(Dataset):
|
||||
"""A simple dataset adaptor for the audio files.
|
||||
Read -> trim silence -> normalize
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
file_paths: List[Path],
|
||||
sample_rate: int,
|
||||
top_db: float=60):
|
||||
self.file_paths = file_paths
|
||||
self.sr = sample_rate
|
||||
self.top_db = top_db
|
||||
|
||||
def __getitem__(self, i):
|
||||
fpath = self.file_paths[i]
|
||||
y, sr = librosa.load(fpath, sr=self.sr)
|
||||
y, _ = librosa.effects.trim(y, top_db=self.top_db)
|
||||
y = librosa.util.normalize(y)
|
||||
y = y.astype(np.float32)
|
||||
return y
|
||||
|
||||
def __len__(self):
|
||||
return len(self.file_paths)
|
||||
|
||||
|
||||
class AudioFolderDataset(AudioDataset):
|
||||
def __init__(
|
||||
self,
|
||||
root,
|
||||
sample_rate,
|
||||
top_db=60,
|
||||
extension=".wav", ):
|
||||
root = Path(root).expanduser()
|
||||
file_paths = sorted(list(root.rglob("*{}".format(extension))))
|
||||
super().__init__(file_paths, sample_rate, top_db)
|
@ -0,0 +1,258 @@
|
||||
#!/bin/bash
|
||||
source test_tipc/common_func.sh
|
||||
|
||||
# set env
|
||||
python=python
|
||||
export model_branch=`git symbolic-ref HEAD 2>/dev/null | cut -d"/" -f 3`
|
||||
export model_commit=$(git log|head -n1|awk '{print $2}')
|
||||
export str_tmp=$(echo `pip list|grep paddlepaddle-gpu|awk -F ' ' '{print $2}'`)
|
||||
export frame_version=${str_tmp%%.post*}
|
||||
export frame_commit=$(echo `${python} -c "import paddle;print(paddle.version.commit)"`)
|
||||
|
||||
# run benchmark sh
|
||||
# Usage:
|
||||
# bash run_benchmark_train.sh config.txt params
|
||||
# or
|
||||
# bash run_benchmark_train.sh config.txt
|
||||
|
||||
function func_parser_params(){
|
||||
strs=$1
|
||||
IFS="="
|
||||
array=(${strs})
|
||||
tmp=${array[1]}
|
||||
echo ${tmp}
|
||||
}
|
||||
|
||||
function func_sed_params(){
|
||||
filename=$1
|
||||
line=$2
|
||||
param_value=$3
|
||||
params=`sed -n "${line}p" $filename`
|
||||
IFS=":"
|
||||
array=(${params})
|
||||
key=${array[0]}
|
||||
value=${array[1]}
|
||||
if [[ $value =~ 'benchmark_train' ]];then
|
||||
IFS='='
|
||||
_val=(${value})
|
||||
param_value="${_val[0]}=${param_value}"
|
||||
fi
|
||||
new_params="${key}:${param_value}"
|
||||
IFS=";"
|
||||
cmd="sed -i '${line}s/.*/${new_params}/' '${filename}'"
|
||||
eval $cmd
|
||||
}
|
||||
|
||||
function set_gpu_id(){
|
||||
string=$1
|
||||
_str=${string:1:6}
|
||||
IFS="C"
|
||||
arr=(${_str})
|
||||
M=${arr[0]}
|
||||
P=${arr[1]}
|
||||
gn=`expr $P - 1`
|
||||
gpu_num=`expr $gn / $M`
|
||||
seq=`seq -s "," 0 $gpu_num`
|
||||
echo $seq
|
||||
}
|
||||
|
||||
function get_repo_name(){
|
||||
IFS=";"
|
||||
cur_dir=$(pwd)
|
||||
IFS="/"
|
||||
arr=(${cur_dir})
|
||||
echo ${arr[-1]}
|
||||
}
|
||||
|
||||
FILENAME=$1
|
||||
# copy FILENAME as new
|
||||
new_filename="./test_tipc/benchmark_train.txt"
|
||||
cmd=`yes|cp $FILENAME $new_filename`
|
||||
FILENAME=$new_filename
|
||||
# MODE must be one of ['benchmark_train']
|
||||
MODE=$2
|
||||
PARAMS=$3
|
||||
# bash test_tipc/benchmark_train.sh test_tipc/configs/det_mv3_db_v2_0/train_benchmark.txt benchmark_train dynamic_bs8_null_DP_N1C1
|
||||
IFS=$'\n'
|
||||
# parser params from train_benchmark.txt
|
||||
dataline=`cat $FILENAME`
|
||||
# parser params
|
||||
IFS=$'\n'
|
||||
lines=(${dataline})
|
||||
model_name=$(func_parser_value "${lines[1]}")
|
||||
|
||||
# 获取benchmark_params所在的行数
|
||||
line_num=`grep -n "train_benchmark_params" $FILENAME | cut -d ":" -f 1`
|
||||
# for train log parser
|
||||
batch_size=$(func_parser_value "${lines[line_num]}")
|
||||
line_num=`expr $line_num + 1`
|
||||
fp_items=$(func_parser_value "${lines[line_num]}")
|
||||
line_num=`expr $line_num + 1`
|
||||
epoch=$(func_parser_value "${lines[line_num]}")
|
||||
|
||||
line_num=`expr $line_num + 1`
|
||||
profile_option_key=$(func_parser_key "${lines[line_num]}")
|
||||
profile_option_params=$(func_parser_value "${lines[line_num]}")
|
||||
profile_option="${profile_option_key}:${profile_option_params}"
|
||||
|
||||
line_num=`expr $line_num + 1`
|
||||
flags_value=$(func_parser_value "${lines[line_num]}")
|
||||
# set flags
|
||||
IFS=";"
|
||||
flags_list=(${flags_value})
|
||||
for _flag in ${flags_list[*]}; do
|
||||
cmd="export ${_flag}"
|
||||
eval $cmd
|
||||
done
|
||||
|
||||
# set log_name
|
||||
repo_name=$(get_repo_name )
|
||||
SAVE_LOG=${BENCHMARK_LOG_DIR:-$(pwd)} # */benchmark_log
|
||||
mkdir -p "${SAVE_LOG}/benchmark_log/"
|
||||
status_log="${SAVE_LOG}/benchmark_log/results.log"
|
||||
|
||||
# The number of lines in which train params can be replaced.
|
||||
line_python=3
|
||||
line_gpuid=4
|
||||
line_precision=6
|
||||
line_epoch=7
|
||||
line_batchsize=9
|
||||
line_profile=13
|
||||
line_eval_py=24
|
||||
line_export_py=30
|
||||
|
||||
func_sed_params "$FILENAME" "${line_eval_py}" "null"
|
||||
func_sed_params "$FILENAME" "${line_export_py}" "null"
|
||||
func_sed_params "$FILENAME" "${line_python}" "$python"
|
||||
|
||||
# if params
|
||||
if [ ! -n "$PARAMS" ] ;then
|
||||
# PARAMS input is not a word.
|
||||
IFS="|"
|
||||
batch_size_list=(${batch_size})
|
||||
fp_items_list=(${fp_items})
|
||||
device_num_list=(N1C4)
|
||||
run_mode="DP"
|
||||
else
|
||||
# parser params from input: modeltype_bs${bs_item}_${fp_item}_${run_mode}_${device_num}
|
||||
IFS="_"
|
||||
params_list=(${PARAMS})
|
||||
model_type=${params_list[0]}
|
||||
batch_size=${params_list[1]}
|
||||
batch_size=`echo ${batch_size} | tr -cd "[0-9]" `
|
||||
precision=${params_list[2]}
|
||||
# run_process_type=${params_list[3]}
|
||||
run_mode=${params_list[3]}
|
||||
device_num=${params_list[4]}
|
||||
IFS=";"
|
||||
|
||||
if [ ${precision} = "null" ];then
|
||||
precision="fp32"
|
||||
fi
|
||||
|
||||
fp_items_list=($precision)
|
||||
batch_size_list=($batch_size)
|
||||
device_num_list=($device_num)
|
||||
fi
|
||||
|
||||
IFS="|"
|
||||
for batch_size in ${batch_size_list[*]}; do
|
||||
for precision in ${fp_items_list[*]}; do
|
||||
for device_num in ${device_num_list[*]}; do
|
||||
# sed batchsize and precision
|
||||
func_sed_params "$FILENAME" "${line_precision}" "$precision"
|
||||
func_sed_params "$FILENAME" "${line_batchsize}" "$MODE=$batch_size"
|
||||
func_sed_params "$FILENAME" "${line_epoch}" "$MODE=$epoch"
|
||||
gpu_id=$(set_gpu_id $device_num)
|
||||
|
||||
if [ ${#gpu_id} -le 1 ];then
|
||||
run_process_type="SingleP"
|
||||
log_path="$SAVE_LOG/profiling_log"
|
||||
mkdir -p $log_path
|
||||
log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}_${device_num}_profiling"
|
||||
func_sed_params "$FILENAME" "${line_gpuid}" "0" # sed used gpu_id
|
||||
# set profile_option params
|
||||
tmp=`sed -i "${line_profile}s/.*/${profile_option}/" "${FILENAME}"`
|
||||
|
||||
# run test_train_inference_python.sh
|
||||
cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} benchmark_train > ${log_path}/${log_name} 2>&1 "
|
||||
echo $cmd
|
||||
eval $cmd
|
||||
eval "cat ${log_path}/${log_name}"
|
||||
|
||||
# without profile
|
||||
log_path="$SAVE_LOG/train_log"
|
||||
speed_log_path="$SAVE_LOG/index"
|
||||
mkdir -p $log_path
|
||||
mkdir -p $speed_log_path
|
||||
log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}_${device_num}_log"
|
||||
speed_log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}_${device_num}_speed"
|
||||
func_sed_params "$FILENAME" "${line_profile}" "null" # sed profile_id as null
|
||||
cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} benchmark_train > ${log_path}/${log_name} 2>&1 "
|
||||
echo $cmd
|
||||
job_bt=`date '+%Y%m%d%H%M%S'`
|
||||
eval $cmd
|
||||
job_et=`date '+%Y%m%d%H%M%S'`
|
||||
export model_run_time=$((${job_et}-${job_bt}))
|
||||
eval "cat ${log_path}/${log_name}"
|
||||
|
||||
# parser log
|
||||
_model_name="${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}"
|
||||
cmd="${python} ${BENCHMARK_ROOT}/scripts/analysis.py --filename ${log_path}/${log_name} \
|
||||
--speed_log_file '${speed_log_path}/${speed_log_name}' \
|
||||
--model_name ${_model_name} \
|
||||
--base_batch_size ${batch_size} \
|
||||
--run_mode ${run_mode} \
|
||||
--run_process_type ${run_process_type} \
|
||||
--fp_item ${precision} \
|
||||
--keyword ips: \
|
||||
--skip_steps 2 \
|
||||
--device_num ${device_num} \
|
||||
--speed_unit samples/s \
|
||||
--convergence_key loss: "
|
||||
echo $cmd
|
||||
eval $cmd
|
||||
last_status=${PIPESTATUS[0]}
|
||||
status_check $last_status "${cmd}" "${status_log}"
|
||||
else
|
||||
IFS=";"
|
||||
unset_env=`unset CUDA_VISIBLE_DEVICES`
|
||||
run_process_type="MultiP"
|
||||
log_path="$SAVE_LOG/train_log"
|
||||
speed_log_path="$SAVE_LOG/index"
|
||||
mkdir -p $log_path
|
||||
mkdir -p $speed_log_path
|
||||
log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}_${device_num}_log"
|
||||
speed_log_name="${repo_name}_${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}_${device_num}_speed"
|
||||
func_sed_params "$FILENAME" "${line_gpuid}" "$gpu_id" # sed used gpu_id
|
||||
func_sed_params "$FILENAME" "${line_profile}" "null" # sed --profile_option as null
|
||||
cmd="bash test_tipc/test_train_inference_python.sh ${FILENAME} benchmark_train > ${log_path}/${log_name} 2>&1 "
|
||||
echo $cmd
|
||||
job_bt=`date '+%Y%m%d%H%M%S'`
|
||||
eval $cmd
|
||||
job_et=`date '+%Y%m%d%H%M%S'`
|
||||
export model_run_time=$((${job_et}-${job_bt}))
|
||||
eval "cat ${log_path}/${log_name}"
|
||||
# parser log
|
||||
_model_name="${model_name}_bs${batch_size}_${precision}_${run_process_type}_${run_mode}"
|
||||
|
||||
cmd="${python} ${BENCHMARK_ROOT}/scripts/analysis.py --filename ${log_path}/${log_name} \
|
||||
--speed_log_file '${speed_log_path}/${speed_log_name}' \
|
||||
--model_name ${_model_name} \
|
||||
--base_batch_size ${batch_size} \
|
||||
--run_mode ${run_mode} \
|
||||
--run_process_type ${run_process_type} \
|
||||
--fp_item ${precision} \
|
||||
--keyword ips: \
|
||||
--skip_steps 2 \
|
||||
--device_num ${device_num} \
|
||||
--speed_unit images/s \
|
||||
--convergence_key loss: "
|
||||
echo $cmd
|
||||
eval $cmd
|
||||
last_status=${PIPESTATUS[0]}
|
||||
status_check $last_status "${cmd}" "${status_log}"
|
||||
fi
|
||||
done
|
||||
done
|
||||
done
|
@ -0,0 +1,57 @@
|
||||
===========================train_params===========================
|
||||
model_name:conformer
|
||||
python:python3.7
|
||||
gpu_list:0|0,1
|
||||
null:null
|
||||
null:null
|
||||
--benchmark-max-step:50
|
||||
null:null
|
||||
--benchmark-batch-size:16
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
##
|
||||
trainer:norm_train
|
||||
norm_train: ../paddlespeech/s2t/exps/u2/bin/train.py --config test_tipc/conformer/benchmark_train/conf/conformer.yaml --output test_tipc/conformer/benchmark_train/outputs --seed 1024
|
||||
pact_train:null
|
||||
fpgm_train:null
|
||||
distill_train:null
|
||||
null:null
|
||||
null:null
|
||||
##
|
||||
===========================eval_params===========================
|
||||
eval:null
|
||||
null:null
|
||||
##
|
||||
===========================infer_params===========================
|
||||
null:null
|
||||
null:null
|
||||
norm_export: null
|
||||
quant_export:null
|
||||
fpgm_export:null
|
||||
distill_export:null
|
||||
export1:null
|
||||
export2:null
|
||||
null:null
|
||||
infer_model:null
|
||||
infer_export:null
|
||||
infer_quant:null
|
||||
inference:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
null:null
|
||||
===========================train_benchmark_params==========================
|
||||
batch_size:16|30
|
||||
fp_items:fp32
|
||||
iteration:50
|
||||
--profiler-options:"batch_range=[10,35];state=GPU;tracer_option=Default;profile_path=model.profile"
|
||||
flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096"
|
@ -0,0 +1,159 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""Prepare Aishell mandarin dataset
|
||||
|
||||
Download, unpack and create manifest files.
|
||||
Manifest file is a json-format file with each line containing the
|
||||
meta data (i.e. audio filepath, transcript and audio duration)
|
||||
of each audio file in the data set.
|
||||
"""
|
||||
import argparse
|
||||
import codecs
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import soundfile
|
||||
|
||||
from utils.utility import download
|
||||
from utils.utility import unpack
|
||||
|
||||
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
|
||||
|
||||
URL_ROOT_TAG
|
||||
DATA_URL = URL_ROOT + '/data_aishell_tiny.tgz'
|
||||
MD5_DATA = '337b1b1ea016761d4fd3225c5b8799b4'
|
||||
RESOURCE_URL = URL_ROOT + '/resource_aishell.tgz'
|
||||
MD5_RESOURCE = '957d480a0fcac85fc18e550756f624e5'
|
||||
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument(
|
||||
"--target_dir",
|
||||
default=DATA_HOME + "/Aishell",
|
||||
type=str,
|
||||
help="Directory to save the dataset. (default: %(default)s)")
|
||||
parser.add_argument(
|
||||
"--manifest_prefix",
|
||||
default="manifest",
|
||||
type=str,
|
||||
help="Filepath prefix for output manifests. (default: %(default)s)")
|
||||
args = parser.parse_args()
|
||||
|
||||
|
||||
def create_manifest(data_dir, manifest_path_prefix):
|
||||
print("Creating manifest %s ..." % manifest_path_prefix)
|
||||
json_lines = []
|
||||
transcript_path = os.path.join(data_dir, 'transcript',
|
||||
'aishell_transcript_v0.8.txt')
|
||||
transcript_dict = {}
|
||||
for line in codecs.open(transcript_path, 'r', 'utf-8'):
|
||||
line = line.strip()
|
||||
if line == '':
|
||||
continue
|
||||
audio_id, text = line.split(' ', 1)
|
||||
# remove withespace, charactor text
|
||||
text = ''.join(text.split())
|
||||
transcript_dict[audio_id] = text
|
||||
|
||||
data_types = ['train', 'dev', 'test']
|
||||
for dtype in data_types:
|
||||
del json_lines[:]
|
||||
total_sec = 0.0
|
||||
total_text = 0.0
|
||||
total_num = 0
|
||||
|
||||
audio_dir = os.path.join(data_dir, 'wav', dtype)
|
||||
for subfolder, _, filelist in sorted(os.walk(audio_dir)):
|
||||
for fname in filelist:
|
||||
audio_path = os.path.abspath(os.path.join(subfolder, fname))
|
||||
audio_id = os.path.basename(fname)[:-4]
|
||||
# if no transcription for audio then skipped
|
||||
if audio_id not in transcript_dict:
|
||||
continue
|
||||
|
||||
utt2spk = Path(audio_path).parent.name
|
||||
audio_data, samplerate = soundfile.read(audio_path)
|
||||
duration = float(len(audio_data) / samplerate)
|
||||
text = transcript_dict[audio_id]
|
||||
json_lines.append(
|
||||
json.dumps(
|
||||
{
|
||||
'utt': audio_id,
|
||||
'utt2spk': str(utt2spk),
|
||||
'feat': audio_path,
|
||||
'feat_shape': (duration, ), # second
|
||||
'text': text
|
||||
},
|
||||
ensure_ascii=False))
|
||||
|
||||
total_sec += duration
|
||||
total_text += len(text)
|
||||
total_num += 1
|
||||
|
||||
manifest_path = manifest_path_prefix + '.' + dtype
|
||||
with codecs.open(manifest_path, 'w', 'utf-8') as fout:
|
||||
for line in json_lines:
|
||||
fout.write(line + '\n')
|
||||
|
||||
manifest_dir = os.path.dirname(manifest_path_prefix)
|
||||
meta_path = os.path.join(manifest_dir, dtype) + '.meta'
|
||||
with open(meta_path, 'w') as f:
|
||||
print(f"{dtype}:", file=f)
|
||||
print(f"{total_num} utts", file=f)
|
||||
print(f"{total_sec / (60*60)} h", file=f)
|
||||
print(f"{total_text} text", file=f)
|
||||
print(f"{total_text / total_sec} text/sec", file=f)
|
||||
print(f"{total_sec / total_num} sec/utt", file=f)
|
||||
|
||||
|
||||
def prepare_dataset(url, md5sum, target_dir, manifest_path=None):
|
||||
"""Download, unpack and create manifest file."""
|
||||
data_dir = os.path.join(target_dir, 'data_aishell_tiny')
|
||||
if not os.path.exists(data_dir):
|
||||
filepath = download(url, md5sum, target_dir)
|
||||
unpack(filepath, target_dir)
|
||||
# unpack all audio tar files
|
||||
audio_dir = os.path.join(data_dir, 'wav')
|
||||
for subfolder, _, filelist in sorted(os.walk(audio_dir)):
|
||||
for ftar in filelist:
|
||||
unpack(os.path.join(subfolder, ftar), subfolder, True)
|
||||
else:
|
||||
print("Skip downloading and unpacking. Data already exists in %s." %
|
||||
target_dir)
|
||||
|
||||
if manifest_path:
|
||||
create_manifest(data_dir, manifest_path)
|
||||
|
||||
|
||||
def main():
|
||||
if args.target_dir.startswith('~'):
|
||||
args.target_dir = os.path.expanduser(args.target_dir)
|
||||
|
||||
prepare_dataset(
|
||||
url=DATA_URL,
|
||||
md5sum=MD5_DATA,
|
||||
target_dir=args.target_dir,
|
||||
manifest_path=args.manifest_prefix)
|
||||
|
||||
prepare_dataset(
|
||||
url=RESOURCE_URL,
|
||||
md5sum=MD5_RESOURCE,
|
||||
target_dir=args.target_dir,
|
||||
manifest_path=None)
|
||||
|
||||
print("Data download and manifest prepare done!")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue