Merge pull request #837 from PaddlePaddle/bench

support benchmark scripts
pull/840/head
Hui Zhang 4 years ago committed by GitHub
commit a75be25787
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,5 +1,3 @@
[中文版](README_cn.md)
# PaddlePaddle Speech to Any toolkit # PaddlePaddle Speech to Any toolkit
![License](https://img.shields.io/badge/license-Apache%202-red.svg) ![License](https://img.shields.io/badge/license-Apache%202-red.svg)
@ -11,7 +9,7 @@
## Features ## Features
See [feature list](doc/src/feature_list.md) for more information. See [feature list](docs/src/feature_list.md) for more information.
## Setup ## Setup
@ -20,20 +18,20 @@ All tested under:
* python>=3.7 * python>=3.7
* paddlepaddle>=2.2.0rc * paddlepaddle>=2.2.0rc
Please see [install](doc/src/install.md). Please see [install](docs/src/install.md).
## Getting Started ## Getting Started
Please see [Getting Started](doc/src/getting_started.md) and [tiny egs](examples/tiny/s0/README.md). Please see [Getting Started](docs/src/getting_started.md) and [tiny egs](examples/tiny/s0/README.md).
## More Information ## More Information
* [Data Prepration](doc/src/data_preparation.md) * [Data Prepration](docs/src/data_preparation.md)
* [Data Augmentation](doc/src/augmentation.md) * [Data Augmentation](docs/src/augmentation.md)
* [Ngram LM](doc/src/ngram_lm.md) * [Ngram LM](docs/src/ngram_lm.md)
* [Benchmark](doc/src/benchmark.md) * [Benchmark](docs/src/benchmark.md)
* [Relased Model](doc/src/released_model.md) * [Relased Model](docs/src/released_model.md)
## Questions and Help ## Questions and Help
@ -47,4 +45,4 @@ DeepSpeech is provided under the [Apache-2.0 License](./LICENSE).
## Acknowledgement ## Acknowledgement
We depends on many open source repos. See [References](doc/src/reference.md) for more information. We depends on many open source repos. See [References](docs/src/reference.md) for more information.

@ -1,49 +0,0 @@
[English](README.md)
# PaddlePaddle Speech to Any toolkit
![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.7+-orange.svg)
![support os](https://img.shields.io/badge/os-linux-yellow.svg)
*DeepSpeech*是一个采用[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)平台的端到端自动语音识别引擎的开源项目,
我们的愿景是为语音识别在工业应用和学术研究上,提供易于使用、高效、小型化和可扩展的工具,包括训练,推理,以及 部署。
## 特性
参看 [特性列表](doc/src/feature_list.md)。
## 安装
在以下环境测试验证过:
* Ubuntu 16.04
* python>=3.7
* paddlepaddle>=2.2.0rc
参看 [安装](doc/src/install.md)。
## 开始
请查看 [开始](doc/src/getting_started.md) 和 [tiny egs](examples/tiny/s0/README.md)。
## 更多信息
* [数据处理](doc/src/data_preparation.md)
* [数据增强](doc/src/augmentation.md)
* [语言模型](doc/src/ngram_lm.md)
* [Benchmark](doc/src/benchmark.md)
* [Relased Model](doc/src/released_model.md)
## 问题和帮助
欢迎您在[Github讨论](https://github.com/PaddlePaddle/DeepSpeech/discussions)提交问题,[Github问题](https://github.com/PaddlePaddle/models/issues)中反馈bug。也欢迎您为这个项目做出贡献。
## License
DeepSpeech 遵循[Apache-2.0开源协议](./LICENSE)。
## 感谢
开发中参考一些优秀的仓库,详情参见 [References](doc/src/reference.md)。

@ -1,191 +0,0 @@
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Beam search parameters tuning for DeepSpeech2 model."""
import functools
import sys
import numpy as np
from paddle.io import DataLoader
from deepspeech.exps.deepspeech2.config import get_cfg_defaults
from deepspeech.io.collator import SpeechCollator
from deepspeech.io.dataset import ManifestDataset
from deepspeech.models.ds2 import DeepSpeech2Model
from deepspeech.training.cli import default_argument_parser
from deepspeech.utils import error_rate
from deepspeech.utils.utility import add_arguments
from deepspeech.utils.utility import print_arguments
def tune(config, args):
"""Tune parameters alpha and beta incrementally."""
if not args.num_alphas >= 0:
raise ValueError("num_alphas must be non-negative!")
if not args.num_betas >= 0:
raise ValueError("num_betas must be non-negative!")
config.defrost()
config.data.manfiest = config.data.dev_manifest
config.data.augmentation_config = ""
config.data.keep_transcription_text = True
dev_dataset = ManifestDataset.from_config(config)
valid_loader = DataLoader(
dev_dataset,
batch_size=config.data.batch_size,
shuffle=False,
drop_last=False,
collate_fn=SpeechCollator(keep_transcription_text=True))
model = DeepSpeech2Model.from_pretrained(valid_loader, config,
args.checkpoint_path)
model.eval()
# decoders only accept string encoded in utf-8
vocab_list = valid_loader.dataset.vocab_list
errors_func = error_rate.char_errors if config.decoding.error_rate_type == 'cer' else error_rate.word_errors
# create grid for search
cand_alphas = np.linspace(args.alpha_from, args.alpha_to, args.num_alphas)
cand_betas = np.linspace(args.beta_from, args.beta_to, args.num_betas)
params_grid = [(alpha, beta) for alpha in cand_alphas
for beta in cand_betas]
err_sum = [0.0 for i in range(len(params_grid))]
err_ave = [0.0 for i in range(len(params_grid))]
num_ins, len_refs, cur_batch = 0, 0, 0
# initialize external scorer
model.decoder.init_decode(args.alpha_from, args.beta_from,
config.decoding.lang_model_path, vocab_list,
config.decoding.decoding_method)
## incremental tuning parameters over multiple batches
print("start tuning ...")
for infer_data in valid_loader():
if (args.num_batches >= 0) and (cur_batch >= args.num_batches):
break
def ordid2token(texts, texts_len):
""" ord() id to chr() chr """
trans = []
for text, n in zip(texts, texts_len):
n = n.numpy().item()
ids = text[:n]
trans.append(''.join([chr(i) for i in ids]))
return trans
audio, audio_len, text, text_len = infer_data
target_transcripts = ordid2token(text, text_len)
num_ins += audio.shape[0]
# model infer
eouts, eouts_len = model.encoder(audio, audio_len)
probs = model.decoder.softmax(eouts)
# grid search
for index, (alpha, beta) in enumerate(params_grid):
print(f"tuneing: alpha={alpha} beta={beta}")
result_transcripts = model.decoder.decode_probs(
probs.numpy(), eouts_len, vocab_list,
config.decoding.decoding_method,
config.decoding.lang_model_path, alpha, beta,
config.decoding.beam_size, config.decoding.cutoff_prob,
config.decoding.cutoff_top_n, config.decoding.num_proc_bsearch)
for target, result in zip(target_transcripts, result_transcripts):
errors, len_ref = errors_func(target, result)
err_sum[index] += errors
# accumulate the length of references of every batchπ
# in the first iteration
if args.alpha_from == alpha and args.beta_from == beta:
len_refs += len_ref
err_ave[index] = err_sum[index] / len_refs
if index % 2 == 0:
sys.stdout.write('.')
sys.stdout.flush()
print("tuneing: one grid done!")
# output on-line tuning result at the end of current batch
err_ave_min = min(err_ave)
min_index = err_ave.index(err_ave_min)
print("\nBatch %d [%d/?], current opt (alpha, beta) = (%s, %s), "
" min [%s] = %f" %
(cur_batch, num_ins, "%.3f" % params_grid[min_index][0],
"%.3f" % params_grid[min_index][1],
config.decoding.error_rate_type, err_ave_min))
cur_batch += 1
# output WER/CER at every (alpha, beta)
print("\nFinal %s:\n" % config.decoding.error_rate_type)
for index in range(len(params_grid)):
print("(alpha, beta) = (%s, %s), [%s] = %f" %
("%.3f" % params_grid[index][0], "%.3f" % params_grid[index][1],
config.decoding.error_rate_type, err_ave[index]))
err_ave_min = min(err_ave)
min_index = err_ave.index(err_ave_min)
print("\nFinish tuning on %d batches, final opt (alpha, beta) = (%s, %s)" %
(cur_batch, "%.3f" % params_grid[min_index][0],
"%.3f" % params_grid[min_index][1]))
print("finish tuning")
def main(config, args):
tune(config, args)
if __name__ == "__main__":
parser = default_argument_parser()
add_arg = functools.partial(add_arguments, argparser=parser)
add_arg('num_batches', int, -1, "# of batches tuning on. "
"Default -1, on whole dev set.")
add_arg('num_alphas', int, 45, "# of alpha candidates for tuning.")
add_arg('num_betas', int, 8, "# of beta candidates for tuning.")
add_arg('alpha_from', float, 1.0, "Where alpha starts tuning from.")
add_arg('alpha_to', float, 3.2, "Where alpha ends tuning with.")
add_arg('beta_from', float, 0.1, "Where beta starts tuning from.")
add_arg('beta_to', float, 0.45, "Where beta ends tuning with.")
add_arg('batch_size', int, 256, "# of samples per batch.")
add_arg('beam_size', int, 500, "Beam search width.")
add_arg('num_proc_bsearch', int, 8, "# of CPUs for beam search.")
add_arg('cutoff_prob', float, 1.0, "Cutoff probability for pruning.")
add_arg('cutoff_top_n', int, 40, "Cutoff number for pruning.")
args = parser.parse_args()
print_arguments(args, globals())
# https://yaml.org/type/float.html
config = get_cfg_defaults()
if args.config:
config.merge_from_file(args.config)
if args.opts:
config.merge_from_list(args.opts)
config.data.batch_size = args.batch_size
config.decoding.beam_size = args.beam_size
config.decoding.num_proc_bsearch = args.num_proc_bsearch
config.decoding.cutoff_prob = args.cutoff_prob
config.decoding.cutoff_top_n = args.cutoff_top_n
config.freeze()
print(config)
if args.dump_config:
with open(args.dump_config, 'w') as f:
print(config, file=f)
main(config, args)

@ -35,12 +35,14 @@ from deepspeech.models.ds2 import DeepSpeech2Model
from deepspeech.models.ds2_online import DeepSpeech2InferModelOnline from deepspeech.models.ds2_online import DeepSpeech2InferModelOnline
from deepspeech.models.ds2_online import DeepSpeech2ModelOnline from deepspeech.models.ds2_online import DeepSpeech2ModelOnline
from deepspeech.training.gradclip import ClipGradByGlobalNormWithLog from deepspeech.training.gradclip import ClipGradByGlobalNormWithLog
from deepspeech.training.reporter import report
from deepspeech.training.trainer import Trainer from deepspeech.training.trainer import Trainer
from deepspeech.utils import error_rate from deepspeech.utils import error_rate
from deepspeech.utils import layer_tools from deepspeech.utils import layer_tools
from deepspeech.utils import mp_tools from deepspeech.utils import mp_tools
from deepspeech.utils.log import Autolog from deepspeech.utils.log import Autolog
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.utility import UpdateConfig
logger = Log(__name__).getlog() logger = Log(__name__).getlog()
@ -66,7 +68,9 @@ class DeepSpeech2Trainer(Trainer):
super().__init__(config, args) super().__init__(config, args)
def train_batch(self, batch_index, batch_data, msg): def train_batch(self, batch_index, batch_data, msg):
train_conf = self.config.training batch_size = self.config.collator.batch_size
accum_grad = self.config.training.accum_grad
start = time.time() start = time.time()
# forward # forward
@ -77,7 +81,7 @@ class DeepSpeech2Trainer(Trainer):
} }
# loss backward # loss backward
if (batch_index + 1) % train_conf.accum_grad != 0: if (batch_index + 1) % accum_grad != 0:
# Disable gradient synchronizations across DDP processes. # Disable gradient synchronizations across DDP processes.
# Within this context, gradients will be accumulated on module # Within this context, gradients will be accumulated on module
# variables, which will later be synchronized. # variables, which will later be synchronized.
@ -92,19 +96,18 @@ class DeepSpeech2Trainer(Trainer):
layer_tools.print_grads(self.model, print_func=None) layer_tools.print_grads(self.model, print_func=None)
# optimizer step # optimizer step
if (batch_index + 1) % train_conf.accum_grad == 0: if (batch_index + 1) % accum_grad == 0:
self.optimizer.step() self.optimizer.step()
self.optimizer.clear_grad() self.optimizer.clear_grad()
self.iteration += 1 self.iteration += 1
iteration_time = time.time() - start iteration_time = time.time() - start
msg += "train time: {:>.3f}s, ".format(iteration_time) for k, v in losses_np.items():
msg += "batch size: {}, ".format(self.config.collator.batch_size) report(k, v)
msg += "accum: {}, ".format(train_conf.accum_grad) report("batch_size", batch_size)
msg += ', '.join('{}: {:>.6f}'.format(k, v) report("accum", accum_grad)
for k, v in losses_np.items()) report("step_cost", iteration_time)
logger.info(msg)
if dist.get_rank() == 0 and self.visualizer: if dist.get_rank() == 0 and self.visualizer:
for k, v in losses_np.items(): for k, v in losses_np.items():
@ -147,10 +150,9 @@ class DeepSpeech2Trainer(Trainer):
def setup_model(self): def setup_model(self):
config = self.config.clone() config = self.config.clone()
config.defrost() with UpdateConfig(config):
config.model.feat_size = self.train_loader.collate_fn.feature_size config.model.feat_size = self.train_loader.collate_fn.feature_size
config.model.dict_size = self.train_loader.collate_fn.vocab_size config.model.dict_size = self.train_loader.collate_fn.vocab_size
config.freeze()
if self.args.model_type == 'offline': if self.args.model_type == 'offline':
model = DeepSpeech2Model.from_config(config.model) model = DeepSpeech2Model.from_config(config.model)

@ -17,6 +17,7 @@ import os
import sys import sys
import time import time
from collections import defaultdict from collections import defaultdict
from collections import OrderedDict
from contextlib import nullcontext from contextlib import nullcontext
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
@ -33,6 +34,8 @@ from deepspeech.io.sampler import SortagradBatchSampler
from deepspeech.io.sampler import SortagradDistributedBatchSampler from deepspeech.io.sampler import SortagradDistributedBatchSampler
from deepspeech.models.u2 import U2Model from deepspeech.models.u2 import U2Model
from deepspeech.training.optimizer import OptimizerFactory from deepspeech.training.optimizer import OptimizerFactory
from deepspeech.training.reporter import ObsScope
from deepspeech.training.reporter import report
from deepspeech.training.scheduler import LRSchedulerFactory from deepspeech.training.scheduler import LRSchedulerFactory
from deepspeech.training.timer import Timer from deepspeech.training.timer import Timer
from deepspeech.training.trainer import Trainer from deepspeech.training.trainer import Trainer
@ -43,6 +46,7 @@ from deepspeech.utils import mp_tools
from deepspeech.utils import text_grid from deepspeech.utils import text_grid
from deepspeech.utils import utility from deepspeech.utils import utility
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.utility import UpdateConfig
logger = Log(__name__).getlog() logger = Log(__name__).getlog()
@ -100,7 +104,8 @@ class U2Trainer(Trainer):
# Disable gradient synchronizations across DDP processes. # Disable gradient synchronizations across DDP processes.
# Within this context, gradients will be accumulated on module # Within this context, gradients will be accumulated on module
# variables, which will later be synchronized. # variables, which will later be synchronized.
context = self.model.no_sync # When using cpu w/o DDP, model does not have `no_sync`
context = self.model.no_sync if self.parallel else nullcontext
else: else:
# Used for single gpu training and DDP gradient synchronization # Used for single gpu training and DDP gradient synchronization
# processes. # processes.
@ -119,12 +124,11 @@ class U2Trainer(Trainer):
iteration_time = time.time() - start iteration_time = time.time() - start
if (batch_index + 1) % train_conf.log_interval == 0: if (batch_index + 1) % train_conf.log_interval == 0:
msg += "train time: {:>.3f}s, ".format(iteration_time) for k, v in losses_np.items():
msg += "batch size: {}, ".format(self.config.collator.batch_size) report(k, v)
msg += "accum: {}, ".format(train_conf.accum_grad) report("batch_size", self.config.collator.batch_size)
msg += ', '.join('{}: {:>.6f}'.format(k, v) report("accum", train_conf.accum_grad)
for k, v in losses_np.items()) report("step_cost", iteration_time)
logger.info(msg)
if dist.get_rank() == 0 and self.visualizer: if dist.get_rank() == 0 and self.visualizer:
losses_np_v = losses_np.copy() losses_np_v = losses_np.copy()
@ -197,15 +201,29 @@ class U2Trainer(Trainer):
data_start_time = time.time() data_start_time = time.time()
for batch_index, batch in enumerate(self.train_loader): for batch_index, batch in enumerate(self.train_loader):
dataload_time = time.time() - data_start_time dataload_time = time.time() - data_start_time
msg = "Train: Rank: {}, ".format(dist.get_rank()) msg = "Train:"
msg += "epoch: {}, ".format(self.epoch) observation = OrderedDict()
msg += "step: {}, ".format(self.iteration) with ObsScope(observation):
msg += "batch : {}/{}, ".format(batch_index + 1, report("Rank", dist.get_rank())
len(self.train_loader)) report("epoch", self.epoch)
msg += "lr: {:>.8f}, ".format(self.lr_scheduler()) report('step', self.iteration)
msg += "data time: {:>.3f}s, ".format(dataload_time) report('step/total',
(batch_index + 1) / len(self.train_loader))
report("lr", self.lr_scheduler())
self.train_batch(batch_index, batch, msg) self.train_batch(batch_index, batch, msg)
self.after_train_batch() self.after_train_batch()
report('reader_cost', dataload_time)
observation['batch_cost'] = observation[
'reader_cost'] + observation['step_cost']
observation['samples'] = observation['batch_size']
observation['ips[sent./sec]'] = observation[
'batch_size'] / observation['batch_cost']
for k, v in observation.items():
msg += f" {k}: "
msg += f"{v:>.8f}" if isinstance(v,
float) else f"{v}"
msg += ","
logger.info(msg)
data_start_time = time.time() data_start_time = time.time()
except Exception as e: except Exception as e:
logger.error(e) logger.error(e)
@ -314,10 +332,11 @@ class U2Trainer(Trainer):
def setup_model(self): def setup_model(self):
config = self.config config = self.config
model_conf = config.model model_conf = config.model
model_conf.defrost()
with UpdateConfig(model_conf):
model_conf.input_dim = self.train_loader.collate_fn.feature_size model_conf.input_dim = self.train_loader.collate_fn.feature_size
model_conf.output_dim = self.train_loader.collate_fn.vocab_size model_conf.output_dim = self.train_loader.collate_fn.vocab_size
model_conf.freeze()
model = U2Model.from_config(model_conf) model = U2Model.from_config(model_conf)
if self.parallel: if self.parallel:

@ -32,6 +32,7 @@ from deepspeech.training.trainer import Trainer
from deepspeech.training.updaters.trainer import Trainer as NewTrainer from deepspeech.training.updaters.trainer import Trainer as NewTrainer
from deepspeech.utils import layer_tools from deepspeech.utils import layer_tools
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.utility import UpdateConfig
logger = Log(__name__).getlog() logger = Log(__name__).getlog()
@ -121,10 +122,10 @@ class U2Trainer(Trainer):
def setup_model(self): def setup_model(self):
config = self.config config = self.config
model_conf = config.model model_conf = config.model
model_conf.defrost() with UpdateConfig(model_conf):
model_conf.input_dim = self.train_loader.collate_fn.feature_size model_conf.input_dim = self.train_loader.collate_fn.feature_size
model_conf.output_dim = self.train_loader.collate_fn.vocab_size model_conf.output_dim = self.train_loader.collate_fn.vocab_size
model_conf.freeze()
model = U2Model.from_config(model_conf) model = U2Model.from_config(model_conf)
if self.parallel: if self.parallel:

@ -41,6 +41,7 @@ from deepspeech.utils import mp_tools
from deepspeech.utils import text_grid from deepspeech.utils import text_grid
from deepspeech.utils import utility from deepspeech.utils import utility
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.utility import UpdateConfig
logger = Log(__name__).getlog() logger = Log(__name__).getlog()
@ -319,10 +320,10 @@ class U2Trainer(Trainer):
# model # model
model_conf = config.model model_conf = config.model
model_conf.defrost() with UpdateConfig(model_conf):
model_conf.input_dim = self.train_loader.feat_dim model_conf.input_dim = self.train_loader.feat_dim
model_conf.output_dim = self.train_loader.vocab_size model_conf.output_dim = self.train_loader.vocab_size
model_conf.freeze()
model = U2Model.from_config(model_conf) model = U2Model.from_config(model_conf)
if self.parallel: if self.parallel:
model = paddle.DataParallel(model) model = paddle.DataParallel(model)

@ -47,6 +47,7 @@ from deepspeech.utils import mp_tools
from deepspeech.utils import text_grid from deepspeech.utils import text_grid
from deepspeech.utils import utility from deepspeech.utils import utility
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.utility import UpdateConfig
logger = Log(__name__).getlog() logger = Log(__name__).getlog()
@ -345,10 +346,10 @@ class U2STTrainer(Trainer):
def setup_model(self): def setup_model(self):
config = self.config config = self.config
model_conf = config.model model_conf = config.model
model_conf.defrost() with UpdateConfig(model_conf):
model_conf.input_dim = self.train_loader.collate_fn.feature_size model_conf.input_dim = self.train_loader.collate_fn.feature_size
model_conf.output_dim = self.train_loader.collate_fn.vocab_size model_conf.output_dim = self.train_loader.collate_fn.vocab_size
model_conf.freeze()
model = U2STModel.from_config(model_conf) model = U2STModel.from_config(model_conf)
if self.parallel: if self.parallel:

@ -48,6 +48,7 @@ from deepspeech.utils.tensor_utils import add_sos_eos
from deepspeech.utils.tensor_utils import pad_sequence from deepspeech.utils.tensor_utils import pad_sequence
from deepspeech.utils.tensor_utils import th_accuracy from deepspeech.utils.tensor_utils import th_accuracy
from deepspeech.utils.utility import log_add from deepspeech.utils.utility import log_add
from deepspeech.utils.utility import UpdateConfig
__all__ = ["U2Model", "U2InferModel"] __all__ = ["U2Model", "U2InferModel"]
@ -903,10 +904,10 @@ class U2Model(U2BaseModel):
Returns: Returns:
DeepSpeech2Model: The model built from pretrained result. DeepSpeech2Model: The model built from pretrained result.
""" """
config.defrost() with UpdateConfig(config):
config.input_dim = dataloader.collate_fn.feature_size config.input_dim = dataloader.collate_fn.feature_size
config.output_dim = dataloader.collate_fn.vocab_size config.output_dim = dataloader.collate_fn.vocab_size
config.freeze()
model = cls.from_config(config) model = cls.from_config(config)
if checkpoint_path: if checkpoint_path:

@ -42,6 +42,7 @@ from deepspeech.utils import layer_tools
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.tensor_utils import add_sos_eos from deepspeech.utils.tensor_utils import add_sos_eos
from deepspeech.utils.tensor_utils import th_accuracy from deepspeech.utils.tensor_utils import th_accuracy
from deepspeech.utils.utility import UpdateConfig
__all__ = ["U2STModel", "U2STInferModel"] __all__ = ["U2STModel", "U2STInferModel"]
@ -686,10 +687,10 @@ class U2STModel(U2STBaseModel):
Returns: Returns:
DeepSpeech2Model: The model built from pretrained result. DeepSpeech2Model: The model built from pretrained result.
""" """
config.defrost() with UpdateConfig(config):
config.input_dim = dataloader.collate_fn.feature_size config.input_dim = dataloader.collate_fn.feature_size
config.output_dim = dataloader.collate_fn.vocab_size config.output_dim = dataloader.collate_fn.vocab_size
config.freeze()
model = cls.from_config(config) model = cls.from_config(config)
if checkpoint_path: if checkpoint_path:

@ -43,33 +43,57 @@ def default_argument_parser():
""" """
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
# yapf: disable train_group = parser.add_argument_group(
# data and output title='Train Options', description=None)
parser.add_argument("--config", metavar="FILE", help="path of the config file to overwrite to default config with.") train_group.add_argument(
parser.add_argument("--dump-config", metavar="FILE", help="dump config to yaml file.") "--seed",
parser.add_argument("--output", metavar="OUTPUT_DIR", help="path to save checkpoint and logs.") type=int,
default=None,
# load from saved checkpoint help="seed to use for paddle, np and random. None or 0 for random, else set seed."
parser.add_argument("--checkpoint_path", type=str, help="path of the checkpoint to load") )
train_group.add_argument(
# running "--device",
parser.add_argument("--device", type=str, default='gpu', choices=["cpu", "gpu"], type=str,
help="device type to use, cpu and gpu are supported.") default='gpu',
parser.add_argument("--nprocs", type=int, default=1, help="number of parallel processes to use.") choices=["cpu", "gpu"],
help="device cpu and gpu are supported.")
# overwrite extra config and default config train_group.add_argument(
# parser.add_argument("--opts", nargs=argparse.REMAINDER, "--nprocs",
# help="options to overwrite --config file and the default config, passing in KEY VALUE pairs") type=int,
parser.add_argument("--opts", type=str, default=[], nargs='+', default=1,
help="options to overwrite --config file and the default config, passing in KEY VALUE pairs") help="number of parallel processes. 0 for cpu.")
train_group.add_argument(
# random seed "--config", metavar="CONFIG_FILE", help="config file.")
parser.add_argument("--seed", type=int, default=None, train_group.add_argument(
help="seed to use for paddle, np and random. None or 0 for random, else set seed.") "--output", metavar="CKPT_DIR", help="path to save checkpoint.")
train_group.add_argument(
# profiler "--checkpoint_path", type=str, help="path to load checkpoint")
parser.add_argument('--profiler_options', type=str, default=None, train_group.add_argument(
help='The option of profiler, which should be in format \"key1=value1;key2=value2;key3=value3\".') "--opts",
# yapd: enable type=str,
default=[],
nargs='+',
help="overwrite --config file, passing in LIST[KEY VALUE] pairs")
train_group.add_argument(
"--dump-config", metavar="FILE", help="dump config to `this` file.")
profile_group = parser.add_argument_group(
title='Benchmark Options', description=None)
profile_group.add_argument(
'--profiler-options',
type=str,
default=None,
help='The option of profiler, which should be in format \"key1=value1;key2=value2;key3=value3\".'
)
profile_group.add_argument(
'--benchmark-batch-size',
type=int,
default=None,
help='batch size for benchmark.')
profile_group.add_argument(
'--benchmark-max-step',
type=int,
default=None,
help='max iteration for benchmark.')
return parser return parser

@ -20,8 +20,8 @@ from paddle.nn import Layer
from . import extension from . import extension
from ..reporter import DictSummary from ..reporter import DictSummary
from ..reporter import ObsScope
from ..reporter import report from ..reporter import report
from ..reporter import scope
from ..timer import Timer from ..timer import Timer
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
logger = Log(__name__).getlog() logger = Log(__name__).getlog()
@ -78,7 +78,7 @@ class StandardEvaluator(extension.Extension):
summary = DictSummary() summary = DictSummary()
for batch in self.dataloader: for batch in self.dataloader:
observation = {} observation = {}
with scope(observation): with ObsScope(observation):
# main evaluation computation here. # main evaluation computation here.
with paddle.no_grad(): with paddle.no_grad():
self.evaluate_sync(self.evaluate_core(batch)) self.evaluate_sync(self.evaluate_core(batch))

@ -19,7 +19,7 @@ OBSERVATIONS = None
@contextlib.contextmanager @contextlib.contextmanager
def scope(observations): def ObsScope(observations):
# make `observation` the target to report to. # make `observation` the target to report to.
# it is basically a dictionary that stores temporary observations # it is basically a dictionary that stores temporary observations
global OBSERVATIONS global OBSERVATIONS

@ -11,19 +11,24 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import sys
import time import time
from collections import OrderedDict
from pathlib import Path from pathlib import Path
import paddle import paddle
from paddle import distributed as dist from paddle import distributed as dist
from tensorboardX import SummaryWriter from tensorboardX import SummaryWriter
from deepspeech.training.reporter import ObsScope
from deepspeech.training.reporter import report
from deepspeech.training.timer import Timer from deepspeech.training.timer import Timer
from deepspeech.utils import mp_tools from deepspeech.utils import mp_tools
from deepspeech.utils import profiler from deepspeech.utils import profiler
from deepspeech.utils.checkpoint import Checkpoint from deepspeech.utils.checkpoint import Checkpoint
from deepspeech.utils.log import Log from deepspeech.utils.log import Log
from deepspeech.utils.utility import seed_all from deepspeech.utils.utility import seed_all
from deepspeech.utils.utility import UpdateConfig
__all__ = ["Trainer"] __all__ = ["Trainer"]
@ -96,11 +101,21 @@ class Trainer():
self.checkpoint_dir = None self.checkpoint_dir = None
self.iteration = 0 self.iteration = 0
self.epoch = 0 self.epoch = 0
self.rank = dist.get_rank()
logger.info(f"Rank: {self.rank}/{dist.get_world_size()}")
if args.seed: if args.seed:
seed_all(args.seed) seed_all(args.seed)
logger.info(f"Set seed {args.seed}") logger.info(f"Set seed {args.seed}")
if self.args.benchmark_batch_size:
with UpdateConfig(self.config):
self.config.collator.batch_size = self.args.benchmark_batch_size
self.config.training.log_interval = 1
logger.info(
f"Benchmark reset batch-size: {self.args.benchmark_batch_size}")
def setup(self): def setup(self):
"""Setup the experiment. """Setup the experiment.
""" """
@ -188,6 +203,12 @@ class Trainer():
if self.args.profiler_options: if self.args.profiler_options:
profiler.add_profiler_step(self.args.profiler_options) profiler.add_profiler_step(self.args.profiler_options)
if self.args.benchmark_max_step and self.iteration > self.args.benchmark_max_step:
logger.info(
f"Reach benchmark-max-step: {self.args.benchmark_max_step}")
sys.exit(
f"Reach benchmark-max-step: {self.args.benchmark_max_step}")
def train(self): def train(self):
"""The training process control by epoch.""" """The training process control by epoch."""
from_scratch = self.resume_or_scratch() from_scratch = self.resume_or_scratch()
@ -208,15 +229,29 @@ class Trainer():
data_start_time = time.time() data_start_time = time.time()
for batch_index, batch in enumerate(self.train_loader): for batch_index, batch in enumerate(self.train_loader):
dataload_time = time.time() - data_start_time dataload_time = time.time() - data_start_time
msg = "Train: Rank: {}, ".format(dist.get_rank()) msg = "Train:"
msg += "epoch: {}, ".format(self.epoch) observation = OrderedDict()
msg += "step: {}, ".format(self.iteration) with ObsScope(observation):
msg += "batch : {}/{}, ".format(batch_index + 1, report("Rank", dist.get_rank())
len(self.train_loader)) report("epoch", self.epoch)
msg += "lr: {:>.8f}, ".format(self.lr_scheduler()) report('step', self.iteration)
msg += "data time: {:>.3f}s, ".format(dataload_time) report('step/total',
(batch_index + 1) / len(self.train_loader))
report("lr", self.lr_scheduler())
self.train_batch(batch_index, batch, msg) self.train_batch(batch_index, batch, msg)
self.after_train_batch() self.after_train_batch()
report('reader_cost', dataload_time)
observation['batch_cost'] = observation[
'reader_cost'] + observation['step_cost']
observation['samples'] = observation['batch_size']
observation['ips[sent./sec]'] = observation[
'batch_size'] / observation['batch_cost']
for k, v in observation.items():
msg += f" {k}: "
msg += f"{v:>.8f}" if isinstance(v,
float) else f"{v}"
msg += ","
logger.info(msg)
data_start_time = time.time() data_start_time = time.time()
except Exception as e: except Exception as e:
logger.error(e) logger.error(e)

@ -24,7 +24,7 @@ import tqdm
from deepspeech.training.extensions.extension import Extension from deepspeech.training.extensions.extension import Extension
from deepspeech.training.extensions.extension import PRIORITY_READER from deepspeech.training.extensions.extension import PRIORITY_READER
from deepspeech.training.reporter import scope from deepspeech.training.reporter import ObsScope
from deepspeech.training.triggers import get_trigger from deepspeech.training.triggers import get_trigger
from deepspeech.training.triggers.limit_trigger import LimitTrigger from deepspeech.training.triggers.limit_trigger import LimitTrigger
from deepspeech.training.updaters.updater import UpdaterBase from deepspeech.training.updaters.updater import UpdaterBase
@ -144,7 +144,7 @@ class Trainer():
# you can use `report` freely in Updater.update() # you can use `report` freely in Updater.update()
# updating parameters and state # updating parameters and state
with scope(self.observation): with ObsScope(self.observation):
update() update()
p.update() p.update()

@ -16,15 +16,27 @@ import distutils.util
import math import math
import os import os
import random import random
from contextlib import contextmanager
from typing import List from typing import List
import numpy as np import numpy as np
import paddle import paddle
__all__ = ["seed_all", 'print_arguments', 'add_arguments', "log_add"] __all__ = [
"UpdateConfig", "seed_all", 'print_arguments', 'add_arguments', "log_add"
]
@contextmanager
def UpdateConfig(config):
"""Update yacs config"""
config.defrost()
yield
config.freeze()
def seed_all(seed: int=210329): def seed_all(seed: int=210329):
"""freeze random generator seed."""
np.random.seed(seed) np.random.seed(seed)
random.seed(seed) random.seed(seed)
paddle.seed(seed) paddle.seed(seed)

@ -4,7 +4,7 @@ To avoid the trouble of environment setup, [running in Docker container](#runnin
## Prerequisites ## Prerequisites
- Python >= 3.7 - Python >= 3.7
- PaddlePaddle 2.0.0 or later (please refer to the [Installation Guide](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html)) - PaddlePaddle latest version (please refer to the [Installation Guide](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html))
## Setup (Important) ## Setup (Important)

@ -1,5 +1,7 @@
# Reference # Reference
We refer these repos to build `model` and `engine`:
* [delta](https://github.com/Delta-ML/delta.git) * [delta](https://github.com/Delta-ML/delta.git)
* [espnet](https://github.com/espnet/espnet.git) * [espnet](https://github.com/espnet/espnet.git)
* [kaldi](https://github.com/kaldi-asr/kaldi.git) * [kaldi](https://github.com/kaldi-asr/kaldi.git)

@ -1,7 +1,8 @@
#!/bin/bash #!/bin/bash
profiler_options= profiler_options=
benchmark_batch_size=0
benchmark_max_step=0
# seed may break model convergence # seed may break model convergence
seed=0 seed=0
@ -32,12 +33,15 @@ ckpt_name=$2
mkdir -p exp mkdir -p exp
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--seed ${seed} \
--device ${device} \ --device ${device} \
--nproc ${ngpu} \ --nproc ${ngpu} \
--config ${config_path} \ --config ${config_path} \
--output exp/${ckpt_name} \ --output exp/${ckpt_name} \
--profiler_options ${profiler_options} \ --profiler-options "${profiler_options}" \
--seed ${seed} --benchmark-batch-size ${benchmark_batch_size} \
--benchmark-max-step ${benchmark_max_step}
if [ ${seed} != 0 ]; then if [ ${seed} != 0 ]; then
unset FLAGS_cudnn_deterministic unset FLAGS_cudnn_deterministic

@ -38,7 +38,7 @@ python3 -u ${BIN_DIR}/train.py \
--config ${config_path} \ --config ${config_path} \
--output exp/${ckpt_name} \ --output exp/${ckpt_name} \
--model_type ${model_type} \ --model_type ${model_type} \
--profiler_options "${profiler_options}" \ --profiler-options "${profiler_options}" \
--seed ${seed} --seed ${seed}
if [ ${seed} != 0 ]; then if [ ${seed} != 0 ]; then

@ -1,35 +1,47 @@
#!/bin/bash #!/bin/bash
if [ $# != 2 ];then profiler_options=
echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name" benchmark_batch_size=0
exit -1 benchmark_max_step=0
fi
# seed may break model convergence
seed=0
source ${MAIN_ROOT}/utils/parse_options.sh || exit 1;
ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}') ngpu=$(echo $CUDA_VISIBLE_DEVICES | awk -F "," '{print NF}')
echo "using $ngpu gpus..." echo "using $ngpu gpus..."
config_path=$1
ckpt_name=$2
device=gpu device=gpu
if [ ${ngpu} == 0 ];then if [ ${ngpu} == 0 ];then
device=cpu device=cpu
fi fi
mkdir -p exp
# seed may break model convergence
seed=0
if [ ${seed} != 0 ]; then if [ ${seed} != 0 ]; then
export FLAGS_cudnn_deterministic=True export FLAGS_cudnn_deterministic=True
echo "using seed $seed & FLAGS_cudnn_deterministic=True ..."
fi
if [ $# != 2 ];then
echo "usage: CUDA_VISIBLE_DEVICES=0 ${0} config_path ckpt_name"
exit -1
fi fi
config_path=$1
ckpt_name=$2
mkdir -p exp
python3 -u ${BIN_DIR}/train.py \ python3 -u ${BIN_DIR}/train.py \
--seed ${seed} \
--device ${device} \ --device ${device} \
--nproc ${ngpu} \ --nproc ${ngpu} \
--config ${config_path} \ --config ${config_path} \
--output exp/${ckpt_name} \ --output exp/${ckpt_name} \
--seed ${seed} --profiler-options "${profiler_options}" \
--benchmark-batch-size ${benchmark_batch_size} \
--benchmark-max-step ${benchmark_max_step}
if [ ${seed} != 0 ]; then if [ ${seed} != 0 ]; then
unset FLAGS_cudnn_deterministic unset FLAGS_cudnn_deterministic

@ -1,41 +1,46 @@
#!/bin/bash #!/bin/bash
CUR_DIR=${PWD}
ROOT_DIR=../../ ROOT_DIR=../../
# 提供可稳定复现性能的脚本默认在标准docker环境内py37执行 # 提供可稳定复现性能的脚本默认在标准docker环境内py37执行
# collect env info # collect env info
bash ${ROOT_DIR}/utils/pd_env_collect.sh bash ${ROOT_DIR}/utils/pd_env_collect.sh
cat pd_env.txt #cat pd_env.txt
# 执行目录:需说明
pushd ${ROOT_DIR}/examples/aishell/s1
# 1 安装该模型需要的依赖 (如需开启优化策略请注明) # 1 安装该模型需要的依赖 (如需开启优化策略请注明)
pushd ${ROOT_DIR}/tools; make; popd #pushd ${ROOT_DIR}/tools; make; popd
source ${ROOT_DIR}/tools/venv/bin/activate #source ${ROOT_DIR}/tools/venv/bin/activate
pushd ${ROOT_DIR}; bash setup.sh; popd #pushd ${ROOT_DIR}; bash setup.sh; popd
# 2 拷贝该模型需要数据、预训练模型 # 2 拷贝该模型需要数据、预训练模型
# 执行目录:需说明
#pushd ${ROOT_DIR}/examples/aishell/s1
pushd ${ROOT_DIR}/examples/tiny/s1
mkdir -p exp/log mkdir -p exp/log
loca/data.sh &> exp/log/data.log . path.sh
#bash local/data.sh &> exp/log/data.log
# 3 批量运行如不方便批量12需放到单个模型中 # 3 批量运行如不方便批量12需放到单个模型中
model_mode_list=(conformer) model_mode_list=(conformer transformer)
fp_item_list=(fp32) fp_item_list=(fp32)
bs_item=(32 64 96) bs_item_list=(32 64 96)
for model_mode in ${model_mode_list[@]}; do for model_mode in ${model_mode_list[@]}; do
for fp_item in ${fp_item_list[@]}; do for fp_item in ${fp_item_list[@]}; do
for bs_item in ${bs_list[@]} for bs_item in ${bs_item_list[@]}
do do
echo "index is speed, 1gpus, begin, ${model_name}" echo "index is speed, 1gpus, begin, ${model_name}"
run_mode=sp run_mode=sp
CUDA_VISIBLE_DEVICES=0 bash run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 500 ${model_mode} # (5min) CUDA_VISIBLE_DEVICES=0 bash ${CUR_DIR}/run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 500 ${model_mode} # (5min)
sleep 60 sleep 60
echo "index is speed, 8gpus, run_mode is multi_process, begin, ${model_name}" echo "index is speed, 8gpus, run_mode is multi_process, begin, ${model_name}"
run_mode=mp run_mode=mp
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 500 ${model_mode} CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash ${CUR_DIR}/run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 500 ${model_mode}
sleep 60 sleep 60
done done
done done

@ -23,19 +23,19 @@ function _train(){
echo "Train on ${num_gpu_devices} GPUs" echo "Train on ${num_gpu_devices} GPUs"
echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size" echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size"
train_cmd="--model_name=${model_name} train_cmd="--benchmark-batch-size ${batch_size}
--batch_size=${batch_size} --benchmark-max-step ${max_iter}
--fp=${fp_item} \ conf/${model_name}.yaml ${model_name}"
--max_iter=${max_iter} "
case ${run_mode} in case ${run_mode} in
sp) train_cmd="python -u tools/train.py "${train_cmd}" ;; sp) train_cmd="bash local/train.sh "${train_cmd}"" ;;
mp) mp)
train_cmd="python -m paddle.distributed.launch --log_dir=./mylog --gpus=$CUDA_VISIBLE_DEVICES tools/train.py "${train_cmd}" train_cmd="bash local/train.sh "${train_cmd}"" ;;
log_parse_file="mylog/workerlog.0" ;;
*) echo "choose run_mode(sp or mp)"; exit 1; *) echo "choose run_mode(sp or mp)"; exit 1;
esac esac
# 以下不用修改 # 以下不用修改
timeout 15m ${train_cmd} > ${log_file} 2>&1 CUDA_VISIBLE_DEVICES=${device} timeout 15m ${train_cmd} > ${log_file} 2>&1
if [ $? -ne 0 ];then if [ $? -ne 0 ];then
echo -e "${model_name}, FAIL" echo -e "${model_name}, FAIL"
export job_fail_flag=1 export job_fail_flag=1
@ -43,7 +43,8 @@ function _train(){
echo -e "${model_name}, SUCCESS" echo -e "${model_name}, SUCCESS"
export job_fail_flag=0 export job_fail_flag=0
fi fi
kill -9 `ps -ef|grep 'python'|awk '{print $2}'`
trap 'for pid in $(jobs -pr); do kill -KILL $pid; done' INT QUIT TERM
if [ $run_mode = "mp" -a -d mylog ]; then if [ $run_mode = "mp" -a -d mylog ]; then
rm ${log_file} rm ${log_file}

Loading…
Cancel
Save