Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System and End-to-End Speech Simultaneous Translation.

tts conformer speech-translation voice-recognition sound-classification transformer asr speech-synthesis voice-cloning punctuation-restoration streaming-tts speech-recognition vocoder kws streaming-asr speech-alignment

Go to file

Yibing Liu 36743d3689 add scoring last word in beam search		8 years ago
data	Remove manifest's line number check from librispeech.py and update README.md.	8 years ago
README.md	Remove manifest's line number check from librispeech.py and update README.md.	8 years ago
audio_data_utils.py	1. Fix incorrect decoder result printing.	8 years ago
decoder.py	add scoring last word in beam search	8 years ago
evaluate.py	final refining on old data provider: enable pruning & add evaluation & code cleanup	8 years ago
infer.py	final refining on old data provider: enable pruning & add evaluation & code cleanup	8 years ago
model.py	Refactor decoder interfaces and add ./data directory.	8 years ago
requirements.txt	1. Fix incorrect decoder result printing.	8 years ago
train.py	Change assert to exception raising.	8 years ago
tune.py	final refining on old data provider: enable pruning & add evaluation & code cleanup	8 years ago

README.md

Deep Speech 2 on PaddlePaddle

Installation

Please replace $PADDLE_INSTALL_DIR with your own paddle installation directory.

pip install -r requirements.txt
export LD_LIBRARY_PATH=$PADDLE_INSTALL_DIR/Paddle/third_party/install/warpctc/lib:$LD_LIBRARY_PATH

For some machines, we also need to install libsndfile1. Details to be added.

Usage

Preparing Data

cd data
python librispeech.py
cat manifest.libri.train-* > manifest.libri.train-all
cd ..

After running librispeech.py, we have several "manifest" json files named with a prefix manifest.libri.. A manifest file summarizes a speech data set, with each line containing the meta data (i.e. audio filepath, transcription text, audio duration) of each audio file within the data set, in json format.

By cat manifest.libri.train-* > manifest.libri.train-all, we simply merge the three seperate sample sets of LibriSpeech (train-clean-100, train-clean-360, train-other-500) into one training set. This is a simple way for merging different data sets.

More help for arguments:

python librispeech.py --help

Traininig

For GPU Training:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --trainer_count 4 --train_manifest_path ./data/manifest.libri.train-all

For CPU Training:

python train.py --trainer_count 8 --use_gpu False -- train_manifest_path ./data/manifest.libri.train-all

More help for arguments:

python train.py --help

Inferencing

python infer.py

More help for arguments:

python infer.py --help