2.5 KiB

Raw Blame History Unescape Escape

Run DS2 on PaddleCloud

Note: Make sure current directory is models/deep_speech_2/cloud/

Step1 Configure data set

You can configure your input data and output path in pcloud_submit.sh:

TRAIN_MANIFEST： Absolute path of train data manifest file in local file system.This file has format as bellow:

{"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.flac", "duration": 5.855, "text
": "mister quilter is the ..."}
{"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.flac", "duration": 4.815, "text
": "nor is mister ..."}

TEST_MANIFEST: Absolute path of train data manifest file in local filesystem.This file has format like TRAIN_MANIFEST.
VOCAB_FILE: Absolute path of vocabulary file in local filesytem.
MEAN_STD_FILE: Absolute path of vocabulary file in local filesytem.
CLOUD_DATA_DIR: Absolute path in PaddleCloud filesystem. We will upload local train data to this directory.
CLOUD_MODEL_DIR: Absolute path in PaddleCloud filesystem. PaddleCloud trainer will save model to this directory.

Note: Upload will be skipped if target file has existed in ${CLOUD_DATA_DIR}.

Step2 Configure computation resource

You can configure computation resource in pcloud_submit.sh:

# Configure computation resource and submit job to PaddleCloud
 paddlecloud submit \
 -image wanghaoshuang/pcloud_ds2:latest \
 -jobname ${JOB_NAME} \
 -cpu 4 \
 -gpu 4 \
 -memory 10Gi \
 -parallelism 1 \
 -pscpu 1 \
 -pservers 1 \
 -psmemory 10Gi \
 -passes 1 \
 -entry "sh pcloud_train.sh ${CLOUD_DATA_DIR} ${CLOUD_MODEL_DIR}" \
 ${DS2_PATH}

For more information, please refer toPaddleCloud

Step3 Configure algorithm options

You can configure algorithm options in pcloud_train.sh:

python train.py \
--use_gpu=1 \
--trainer_count=4 \
--batch_size=256 \
--mean_std_filepath=$MEAN_STD_FILE \
--train_manifest_path='./local.train.manifest' \
--dev_manifest_path='./local.test.manifest' \
--vocab_filepath=$VOCAB_PATH \
--output_model_dir=${MODEL_PATH}

You can get more information about algorithm options by follow command:

cd ..
python train.py --help

Step4 Submit job

$ sh pcloud_submit.sh

Step5 Get logs

$ paddlecloud logs -n 10000 deepspeech20170727130129

For more information, please refer to PaddleCloud client or get help by follow command:

paddlecloud --help

2.5 KiB Raw Blame History Unescape Escape