From 141109b49d08a01fa7232fcd073ad163b24a7a06 Mon Sep 17 00:00:00 2001 From: Hui Zhang Date: Thu, 4 Feb 2021 11:01:40 +0000 Subject: [PATCH] update aishell egs --- README.md | 58 ++++------------------ examples/aishell/local/run_infer.sh | 15 +++--- examples/aishell/local/run_infer_golden.sh | 18 +++---- examples/aishell/local/run_test.sh | 16 +++--- examples/aishell/local/run_test_golden.sh | 18 +++---- examples/aishell/local/run_train.sh | 16 +++--- examples/baidu_en8k/run_test_golden.sh | 18 +++---- examples/librispeech/local/run_tune.sh | 2 +- examples/tiny/README.md | 38 ++++++++++++++ 9 files changed, 95 insertions(+), 104 deletions(-) diff --git a/README.md b/README.md index 7a7cf48d..60a08974 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,9 @@ ## Table of Contents - [Installation](#installation) -- [Running in Docker Container](#running-in-docker-container) - [Getting Started](#getting-started) - [Data Preparation](#data-preparation) - [Training a Model](#training-a-model) -- [Data Augmentation Pipeline](#data-augmentation-pipeline) - [Inference and Evaluation](#inference-and-evaluation) - [Hyper-parameters Tuning](#hyper-parameters-tuning) - [Training for Mandarin Language](#training-for-mandarin-language) @@ -116,42 +114,6 @@ Let's take a tiny sampled subset of [LibriSpeech dataset](http://www.openslr.org ```bash bash run.sh ``` -- Prepare the data - - ```bash - sh local/run_data.sh - ``` - - `run_data.sh` will download dataset, generate manifests, collect normalizer's statistics and build vocabulary. Once the data preparation is done, you will find the data (only part of LibriSpeech) downloaded in `${MAIN_ROOT}/dataset/librispeech` and the corresponding manifest files generated in `${PWD}/data` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time you run this dataset and is reusable for all further experiments. -- Train your own ASR model - - ```bash - sh local/run_train.sh - ``` - - `run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `${PWD}/checkpoints`. These checkpoints could be used for training resuming, inference, evaluation and deployment. -- Case inference with an existing model - - ```bash - sh local/run_infer.sh - ``` - - `run_infer.sh` will show us some speech-to-text decoding results for several (default: 10) samples with the trained model. The performance might not be good now as the current model is only trained with a toy subset of LibriSpeech. To see the results with a better model, you can download a well-trained (trained for several days, with the complete LibriSpeech) model and do the inference: - - ```bash - sh local/run_infer_golden.sh - ``` -- Evaluate an existing model - - ```bash - sh local/run_test.sh - ``` - - `run_test.sh` will evaluate the model with Word Error Rate (or Character Error Rate) measurement. Similarly, you can also download a well-trained model and test its performance: - - ```bash - sh local/run_test_golden.sh - ``` More detailed information are provided in the following sections. Wish you a happy journey with the *DeepSpeech2 on PaddlePaddle* ASR engine! @@ -169,7 +131,7 @@ More detailed information are provided in the following sections. Wish you a hap To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels. -For how to generate such manifest files, please refer to `PATH/TO/LIBRISPEECH/local/librispeech.py`, which will download data and generate manifest files for LibriSpeech dataset. +For how to generate such manifest files, please refer to `examples/librispeech/local/librispeech.py`, which will download data and generate manifest files for LibriSpeech dataset. ### Compute Mean & Stddev for Normalizer @@ -179,11 +141,11 @@ To perform z-score normalization (zero-mean, unit stddev) upon audio features, w python3 tools/compute_mean_std.py \ --num_samples 2000 \ --specgram_type linear \ ---manifest_path PATH/TO/LIBRISPEECH/data/manifest.train \ ---output_path PATH/TO/LIBRISPEECH/data/mean_std.npz +--manifest_path examples/librispeech/data/manifest.train \ +--output_path examples/librispeech/data/mean_std.npz ``` -It will compute the mean and standard deviatio of power spectrum feature with 2000 random sampled audio clips listed in `PATH/TO/LIBRISPEECH/data/manifest.train` and save the results to `PATH/TO/LIBRISPEECH/data/mean_std.npz` for further usage. +It will compute the mean and standard deviatio of power spectrum feature with 2000 random sampled audio clips listed in `examples/librispeech/data/manifest.train` and save the results to `examples/librispeech/data/mean_std.npz` for further usage. ### Build Vocabulary @@ -193,18 +155,18 @@ A vocabulary of possible characters is required to convert the transcription int ```bash python3 tools/build_vocab.py \ --count_threshold 0 \ ---vocab_path PATH/TO/LIBRISPEECH/data/eng_vocab.txt \ ---manifest_paths PATH/TO/LIBRISPEECH/data/manifest.train +--vocab_path examples/librispeech/data/eng_vocab.txt \ +--manifest_paths examples/librispeech/data/manifest.train ``` -It will write a vocabuary file `PATH/TO/LIBRISPEEECH/data/eng_vocab.txt` with all transcription text in `PATH/TO/LIBRISPEECH/data/manifest.train`, without vocabulary truncation (`--count_threshold 0`). +It will write a vocabuary file `examples/librispeech/data/eng_vocab.txt` with all transcription text in `examples/librispeech/data/manifest.train`, without vocabulary truncation (`--count_threshold 0`). ### More Help For more help on arguments: ```bash -python3 data/librispeech/librispeech.py --help +python3 examples/librispeech/local/librispeech.py --help python3 tools/compute_mean_std.py --help python3 tools/build_vocab.py --help ``` @@ -240,7 +202,7 @@ python3 train.py --help or refer to `example/librispeech/local/run_train.sh`. -## Data Augmentation Pipeline +### Data Augmentation Pipeline Data augmentation has often been a highly effective technique to boost the deep learning performance. We augment our speech data by synthesizing new audios with small random perturbation (label-invariant transformation) added upon raw audios. You don't have to do the syntheses on your own, as it is already embedded into the data provider and is done on the fly, randomly for each epoch during training. @@ -440,7 +402,7 @@ Now, in the client console, press the `whitespace` key, hold, and start speaking Notice that `deploy/demo_client.py` must be run on a machine with a microphone device, while `deploy/demo_server.py` could be run on one without any audio recording hardware, e.g. any remote server machine. Just be careful to set the `host_ip` and `host_port` argument with the actual accessible IP address and port, if the server and client are running with two separate machines. Nothing should be done if they are running on one single machine. -Please also refer to `examples/deploy_demo/run_english_demo_server.sh`, which will first download a pre-trained English model (trained with 3000 hours of internal speech data) and then start the demo server with the model. With running `examples/mandarin/run_demo_client.sh`, you can speak English to test it. If you would like to try some other models, just update `--model_path` argument in the script.   +Please also refer to `examples/deploy_demo/run_english_demo_server.sh`, which will first download a pre-trained English model (trained with 3000 hours of internal speech data) and then start the demo server with the model. With running `examples/deploy_demo/run_demo_client.sh`, you can speak English to test it. If you would like to try some other models, just update `--model_path` argument in the script.   For more help on arguments: diff --git a/examples/aishell/local/run_infer.sh b/examples/aishell/local/run_infer.sh index 7a0d7969..90be581b 100644 --- a/examples/aishell/local/run_infer.sh +++ b/examples/aishell/local/run_infer.sh @@ -1,9 +1,8 @@ #! /usr/bin/env bash -cd ../.. > /dev/null # download language model -cd models/lm > /dev/null +cd ${MAIN_ROOT}/models/lm > /dev/null bash download_lm_ch.sh if [ $? -ne 0 ]; then exit 1 @@ -13,7 +12,7 @@ cd - > /dev/null # infer CUDA_VISIBLE_DEVICES=0 \ -python3 -u infer.py \ +python3 -u ${MAIN_ROOT}/infer.py \ --num_samples=10 \ --beam_size=300 \ --num_proc_bsearch=8 \ @@ -27,11 +26,11 @@ python3 -u infer.py \ --use_gru=True \ --use_gpu=True \ --share_rnn_weights=False \ ---infer_manifest="data/aishell/manifest.test" \ ---mean_std_path="data/aishell/mean_std.npz" \ ---vocab_path="data/aishell/vocab.txt" \ ---model_path="checkpoints/aishell/step_final" \ ---lang_model_path="models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ +--infer_manifest="data/manifest.test" \ +--mean_std_path="data/mean_std.npz" \ +--vocab_path="data/vocab.txt" \ +--model_path="checkpoints/step_final" \ +--lang_model_path="${MAIN_ROOT}/models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ --decoding_method="ctc_beam_search" \ --error_rate_type="cer" \ --specgram_type="linear" diff --git a/examples/aishell/local/run_infer_golden.sh b/examples/aishell/local/run_infer_golden.sh index 7e0ede66..296c0d5b 100644 --- a/examples/aishell/local/run_infer_golden.sh +++ b/examples/aishell/local/run_infer_golden.sh @@ -1,9 +1,7 @@ #! /usr/bin/env bash -cd ../.. > /dev/null - # download language model -cd models/lm > /dev/null +cd ${MAIN_ROOT}/models/lm > /dev/null bash download_lm_ch.sh if [ $? -ne 0 ]; then exit 1 @@ -12,7 +10,7 @@ cd - > /dev/null # download well-trained model -cd models/aishell > /dev/null +cd ${MAIN_ROOT}/models/aishell > /dev/null bash download_model.sh if [ $? -ne 0 ]; then exit 1 @@ -22,7 +20,7 @@ cd - > /dev/null # infer CUDA_VISIBLE_DEVICES=0 \ -python3 -u infer.py \ +python3 -u ${MAIN_ROOT}/infer.py \ --num_samples=10 \ --beam_size=300 \ --num_proc_bsearch=8 \ @@ -36,11 +34,11 @@ python3 -u infer.py \ --use_gru=True \ --use_gpu=False \ --share_rnn_weights=False \ ---infer_manifest="data/aishell/manifest.test" \ ---mean_std_path="models/aishell/mean_std.npz" \ ---vocab_path="models/aishell/vocab.txt" \ ---model_path="models/aishell" \ ---lang_model_path="models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ +--infer_manifest="data/manifest.test" \ +--mean_std_path="${MAIN_ROOT}/models/aishell/mean_std.npz" \ +--vocab_path="${MAIN_ROOT}/models/aishell/vocab.txt" \ +--model_path="${MAIN_ROOT}/models/aishell" \ +--lang_model_path="${MAIN_ROOT}/models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ --decoding_method="ctc_beam_search" \ --error_rate_type="cer" \ --specgram_type="linear" diff --git a/examples/aishell/local/run_test.sh b/examples/aishell/local/run_test.sh index 31be69fe..d2dbfb4f 100644 --- a/examples/aishell/local/run_test.sh +++ b/examples/aishell/local/run_test.sh @@ -1,9 +1,7 @@ #! /usr/bin/env bash -cd ../.. > /dev/null - # download language model -cd models/lm > /dev/null +cd ${MAIN_ROOT}/models/lm > /dev/null bash download_lm_ch.sh if [ $? -ne 0 ]; then exit 1 @@ -13,7 +11,7 @@ cd - > /dev/null # evaluate model CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ -python3 -u test.py \ +python3 -u ${MAIN_ROOT}/test.py \ --batch_size=128 \ --beam_size=300 \ --num_proc_bsearch=8 \ @@ -27,11 +25,11 @@ python3 -u test.py \ --use_gru=True \ --use_gpu=True \ --share_rnn_weights=False \ ---test_manifest="data/aishell/manifest.test" \ ---mean_std_path="data/aishell/mean_std.npz" \ ---vocab_path="data/aishell/vocab.txt" \ ---model_path="checkpoints/aishell/step_final" \ ---lang_model_path="models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ +--test_manifest="data/manifest.test" \ +--mean_std_path="data/mean_std.npz" \ +--vocab_path="data/vocab.txt" \ +--model_path="checkpoints/step_final" \ +--lang_model_path="${MAIN_ROOT}/models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ --decoding_method="ctc_beam_search" \ --error_rate_type="cer" \ --specgram_type="linear" diff --git a/examples/aishell/local/run_test_golden.sh b/examples/aishell/local/run_test_golden.sh index ea423c04..062a1b99 100644 --- a/examples/aishell/local/run_test_golden.sh +++ b/examples/aishell/local/run_test_golden.sh @@ -1,9 +1,7 @@ #! /usr/bin/env bash -cd ../.. > /dev/null - # download language model -cd models/lm > /dev/null +cd ${MAIN_ROOT}/models/lm > /dev/null bash download_lm_ch.sh if [ $? -ne 0 ]; then exit 1 @@ -12,7 +10,7 @@ cd - > /dev/null # download well-trained model -cd models/aishell > /dev/null +cd ${MAIN_ROOT}/models/aishell > /dev/null bash download_model.sh if [ $? -ne 0 ]; then exit 1 @@ -22,7 +20,7 @@ cd - > /dev/null # evaluate model CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ -python3 -u test.py \ +python3 -u ${MAIN_ROOT}/test.py \ --batch_size=128 \ --beam_size=300 \ --num_proc_bsearch=8 \ @@ -36,11 +34,11 @@ python3 -u test.py \ --use_gru=True \ --use_gpu=True \ --share_rnn_weights=False \ ---test_manifest="data/aishell/manifest.test" \ ---mean_std_path="models/aishell/mean_std.npz" \ ---vocab_path="models/aishell/vocab.txt" \ ---model_path="models/aishell" \ ---lang_model_path="models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ +--test_manifest="data/manifest.test" \ +--mean_std_path="${MAIN_ROOT}/models/aishell/mean_std.npz" \ +--vocab_path="${MAIN_ROOT}/models/aishell/vocab.txt" \ +--model_path="${MAIN_ROOT}/models/aishell" \ +--lang_model_path="${MAIN_ROOT}/models/lm/zh_giga.no_cna_cmn.prune01244.klm" \ --decoding_method="ctc_beam_search" \ --error_rate_type="cer" \ --specgram_type="linear" diff --git a/examples/aishell/local/run_train.sh b/examples/aishell/local/run_train.sh index 6faed6c6..5bde1372 100644 --- a/examples/aishell/local/run_train.sh +++ b/examples/aishell/local/run_train.sh @@ -1,12 +1,10 @@ #! /usr/bin/env bash -cd ../.. > /dev/null - # train model # if you wish to resume from an exists model, uncomment --init_from_pretrained_model export FLAGS_sync_nccl_allreduce=0 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ -python3 -u train.py \ +python3 -u ${MAIN_ROOT}/train.py \ --batch_size=64 \ --num_epoch=50 \ --num_conv_layers=2 \ @@ -24,12 +22,12 @@ python3 -u train.py \ --use_gpu=True \ --is_local=True \ --share_rnn_weights=False \ ---train_manifest="data/aishell/manifest.train" \ ---dev_manifest="data/aishell/manifest.dev" \ ---mean_std_path="data/aishell/mean_std.npz" \ ---vocab_path="data/aishell/vocab.txt" \ ---output_model_dir="./checkpoints/aishell" \ ---augment_conf_path="conf/augmentation.config" \ +--train_manifest="data/manifest.train" \ +--dev_manifest="data/manifest.dev" \ +--mean_std_path="data/mean_std.npz" \ +--vocab_path="data/vocab.txt" \ +--output_model_dir="./checkpoints" \ +--augment_conf_path="${MAIN_ROOT}/conf/augmentation.config" \ --specgram_type="linear" \ --shuffle_method="batch_shuffle_clipped" \ diff --git a/examples/baidu_en8k/run_test_golden.sh b/examples/baidu_en8k/run_test_golden.sh index f77f934c..10c61a09 100644 --- a/examples/baidu_en8k/run_test_golden.sh +++ b/examples/baidu_en8k/run_test_golden.sh @@ -1,9 +1,9 @@ #! /usr/bin/env bash -cd ../.. > /dev/null +source path.sh # download language model -cd models/lm > /dev/null +cd ${MAIN_ROOT}/models/lm > /dev/null bash download_lm_en.sh if [ $? -ne 0 ]; then exit 1 @@ -12,7 +12,7 @@ cd - > /dev/null # download well-trained model -cd models/baidu_en8k > /dev/null +cd ${MAIN_ROOT}/models/baidu_en8k > /dev/null bash download_model.sh if [ $? -ne 0 ]; then exit 1 @@ -22,7 +22,7 @@ cd - > /dev/null # evaluate model CUDA_VISIBLE_DEVICES=0,1,2,3 \ -python3 -u test.py \ +python3 -u ${MAIN_ROOT}/test.py \ --batch_size=128 \ --beam_size=500 \ --num_proc_bsearch=8 \ @@ -37,11 +37,11 @@ python3 -u test.py \ --use_gru=True \ --use_gpu=False \ --share_rnn_weights=False \ ---test_manifest="data/librispeech/manifest.test-clean" \ ---mean_std_path="models/baidu_en8k/mean_std.npz" \ ---vocab_path="models/baidu_en8k/vocab.txt" \ ---model_path="models/baidu_en8k" \ ---lang_model_path="models/lm/common_crawl_00.prune01111.trie.klm" \ +--test_manifest="data/manifest.test-clean" \ +--mean_std_path="${MAIN_ROOT}/models/baidu_en8k/mean_std.npz" \ +--vocab_path="${MAIN_ROOT}/models/baidu_en8k/vocab.txt" \ +--model_path="${MAIN_ROOT}/models/baidu_en8k" \ +--lang_model_path="${MAIN_ROOT}/models/lm/common_crawl_00.prune01111.trie.klm" \ --decoding_method="ctc_beam_search" \ --error_rate_type="wer" \ --specgram_type="linear" diff --git a/examples/librispeech/local/run_tune.sh b/examples/librispeech/local/run_tune.sh index 80390c35..848f0b8f 100644 --- a/examples/librispeech/local/run_tune.sh +++ b/examples/librispeech/local/run_tune.sh @@ -2,7 +2,7 @@ # grid-search for hyper-parameters in language model CUDA_VISIBLE_DEVICES=0,1,2,3 \ -python3 -u tools/tune.py \ +python3 -u ${MAIN_ROOT}tools/tune.py \ --num_batches=-1 \ --batch_size=128 \ --beam_size=500 \ diff --git a/examples/tiny/README.md b/examples/tiny/README.md index ffa6621f..d7361b26 100644 --- a/examples/tiny/README.md +++ b/examples/tiny/README.md @@ -2,3 +2,41 @@ 1. `source path.sh` 2. `bash run.sh` + +## Steps +- Prepare the data + + ```bash + sh local/run_data.sh + ``` + + `run_data.sh` will download dataset, generate manifests, collect normalizer's statistics and build vocabulary. Once the data preparation is done, you will find the data (only part of LibriSpeech) downloaded in `${MAIN_ROOT}/dataset/librispeech` and the corresponding manifest files generated in `${PWD}/data` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time you run this dataset and is reusable for all further experiments. +- Train your own ASR model + + ```bash + sh local/run_train.sh + ``` + + `run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `${PWD}/checkpoints`. These checkpoints could be used for training resuming, inference, evaluation and deployment. +- Case inference with an existing model + + ```bash + sh local/run_infer.sh + ``` + + `run_infer.sh` will show us some speech-to-text decoding results for several (default: 10) samples with the trained model. The performance might not be good now as the current model is only trained with a toy subset of LibriSpeech. To see the results with a better model, you can download a well-trained (trained for several days, with the complete LibriSpeech) model and do the inference: + + ```bash + sh local/run_infer_golden.sh + ``` +- Evaluate an existing model + + ```bash + sh local/run_test.sh + ``` + + `run_test.sh` will evaluate the model with Word Error Rate (or Character Error Rate) measurement. Similarly, you can also download a well-trained model and test its performance: + + ```bash + sh local/run_test_golden.sh + ``` \ No newline at end of file