2.8 KiB
Quick Start of Speech-To-Text
Several shell scripts provided in ./examples/tiny/local will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. LibriSpeech, Aishell). Reading these examples will also help you to understand how to make it work with your own data.
Some of the scripts in ./examples are not configured with GPUs. If you want to train with 8 GPUs, please modify CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7. If you don't have any GPU available, please set CUDA_VISIBLE_DEVICES= to use CPUs instead. Besides, if out-of-memory problem occurs, just reduce batch_size to fit.
Let's take a tiny sampled subset of LibriSpeech dataset for instance.
-
Go to directory
cd examples/tinyNotice that this is only a toy example with a tiny sampled subset of LibriSpeech. If you would like to try with the complete dataset (would take several days for training), please go to
examples/librispeechinstead. -
Source env
source path.shMust do this before you start to do anything. Set
MAIN_ROOTas project dir. Using defualtdeepspeech2model asMODEL, you can change this in the script. -
Main entrypoint
bash run.shThis is just a demo, please make sure every
stepworks well before nextstep.
More detailed information are provided in the following sections. Wish you a happy journey with the DeepSpeech on PaddlePaddle ASR engine!
Training a model
The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in examples/aishell/local. As mentioned above, please execute sh data.sh, sh train.shandsh test.shto do data preparation, training, and testing correspondingly.
Evaluate a Model
To evaluate a model's performance quantitatively, please run:
CUDA_VISIBLE_DEVICES=0 bash local/test.sh
The error rate (default: word error rate; can be set with error_rate_type) will be printed.
We provide two types of CTC decoders: CTC greedy decoder and CTC beam search decoder. The CTC greedy decoder is an implementation of the simple best-path decoding algorithm, selecting at each timestep the most likely token, thus being greedy and locally optimal. The CTC beam search decoder otherwise utilizes a heuristic breadth-first graph search for reaching a near global optimality; it also requires a pre-trained KenLM language model for better scoring and ranking. The decoder type can be set with argument decoding_method.