You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Xinghai Sun
0e79ee37a4
|
8 years ago | |
---|---|---|
.. | ||
README.md | 8 years ago | |
_init_paths.py | 8 years ago | |
pcloud_submit.sh | 8 years ago | |
pcloud_train.sh | 8 years ago | |
split_data.py | 8 years ago | |
upload_data.py | 8 years ago |
README.md
Run DS2 on PaddleCloud
Note: Make sure PaddleCloud client has be installed and current directory is
models/deep_speech_2/cloud/
Step-1 Configure data set
Configure your input data and output path in pcloud_submit.sh:
TRAIN_MANIFEST
: Absolute path of train data manifest file in local file system.This file has format as bellow:
{"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.flac", "duration": 5.855, "text
": "mister quilter is the ..."}
{"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.flac", "duration": 4.815, "text
": "nor is mister ..."}
TEST_MANIFEST
: Absolute path of train data manifest file in local filesystem. This file has format likeTRAIN_MANIFEST
.VOCAB_FILE
: Absolute path of vocabulary file in local filesytem.MEAN_STD_FILE
: Absolute path of normalizer's statistic file in local filesytem.CLOUD_DATA_DIR:
Absolute path in PaddleCloud filesystem. We will upload local train data to this directory.CLOUD_MODEL_DIR
: Absolute path in PaddleCloud filesystem. PaddleCloud trainer will save model to this directory.
Note: Upload will be skipped if target file has existed in
CLOUD_DATA_DIR
.
Step-2 Configure computation resource
Configure computation resource in pcloud_submit.sh:
# Configure computation resource and submit job to PaddleCloud
paddlecloud submit \
-image wanghaoshuang/pcloud_ds2:latest \
-jobname ${JOB_NAME} \
-cpu 4 \
-gpu 4 \
-memory 10Gi \
-parallelism 1 \
-pscpu 1 \
-pservers 1 \
-psmemory 10Gi \
-passes 1 \
-entry "sh pcloud_train.sh ${CLOUD_DATA_DIR} ${CLOUD_MODEL_DIR}" \
${DS2_PATH}
For more information, please refer to PaddleCloud
Step-3 Configure algorithm options
Configure algorithm options in pcloud_train.sh:
python train.py \
--use_gpu=1 \
--trainer_count=4 \
--batch_size=256 \
--mean_std_filepath=$MEAN_STD_FILE \
--train_manifest_path='./local.train.manifest' \
--dev_manifest_path='./local.test.manifest' \
--vocab_filepath=$VOCAB_PATH \
--output_model_dir=${MODEL_PATH}
You can get more information about algorithm options by follow command:
cd ..
python train.py --help
Step-4 Submit job
$ sh pcloud_submit.sh
Step-5 Get logs
$ paddlecloud logs -n 10000 deepspeech20170727130129
For more information, please refer to PaddleCloud client or get help by follow command:
paddlecloud --help