From e3bb689c0e022345e9833ebacdcba3935bc6b7a0 Mon Sep 17 00:00:00 2001 From: yangyaming Date: Wed, 11 Oct 2017 22:48:00 +0800 Subject: [PATCH 1/4] Add document for Mandarin model. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fca2528aa..0fcb73327 100644 --- a/README.md +++ b/README.md @@ -398,7 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer ## Training for Mandarin Language -TODO: to be added +The steps of training, evaluation and inference for Mandarin ASR model is same with English ASR model. We have provided an example for Mandarin data which using Aishell dataset and you can find it in ```examples/aishell```. As mentioned above, you can execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also tuned a setting to get better model performance (not the best), and you can execute ```sh run_infer_golden.sh``` to show some speech-to-text decoding results. ## Trying Live Demo with Your Own Voice From 1f6a18e8e8e28be7c02ca315187d3d6b99b8f045 Mon Sep 17 00:00:00 2001 From: yangyaming Date: Fri, 3 Nov 2017 15:50:38 +0800 Subject: [PATCH 2/4] Refine doc. --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0fcb73327..622db5077 100644 --- a/README.md +++ b/README.md @@ -398,7 +398,8 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer ## Training for Mandarin Language -The steps of training, evaluation and inference for Mandarin ASR model is same with English ASR model. We have provided an example for Mandarin data which using Aishell dataset and you can find it in ```examples/aishell```. As mentioned above, you can execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also tuned a setting to get better model performance (not the best), and you can execute ```sh run_infer_golden.sh``` to show some speech-to-text decoding results. +Before training model for Mandarin Language, mean stddev file and vocabulary file are also required. For mean stddev file, you can run ```tools/compute_mean_std.py``` to generate as above. However, the Mandarin vocabulary contains much more tokens than English vocabulary, but you can still run ```tools/build_vocab.py``` to generate it. The steps of training, evaluation and inference for Mandarin ASR model is same to English ASR model. Notice that, after training a model please run ```tools/tune.py``` to find an optimal setting for Language Model. +We have provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. ## Trying Live Demo with Your Own Voice From 963b60d5edacb5b14abe43b663977973c9dd406d Mon Sep 17 00:00:00 2001 From: yangyaming Date: Fri, 3 Nov 2017 22:04:23 +0800 Subject: [PATCH 3/4] Refine doc for Mandarin training. --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 622db5077..d33522425 100644 --- a/README.md +++ b/README.md @@ -398,8 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer ## Training for Mandarin Language -Before training model for Mandarin Language, mean stddev file and vocabulary file are also required. For mean stddev file, you can run ```tools/compute_mean_std.py``` to generate as above. However, the Mandarin vocabulary contains much more tokens than English vocabulary, but you can still run ```tools/build_vocab.py``` to generate it. The steps of training, evaluation and inference for Mandarin ASR model is same to English ASR model. Notice that, after training a model please run ```tools/tune.py``` to find an optimal setting for Language Model. -We have provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. +The key steps of training for Mandarin Language are same to that of English Language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character based and please run ```tools/tune.py``` to find an optimal setting. ## Trying Live Demo with Your Own Voice From 046f6ca994f2afa5fe5a23fd3400becf48b9e3f4 Mon Sep 17 00:00:00 2001 From: yangyaming Date: Fri, 3 Nov 2017 22:46:25 +0800 Subject: [PATCH 4/4] Refine doc. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d33522425..84d9754d4 100644 --- a/README.md +++ b/README.md @@ -398,7 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer ## Training for Mandarin Language -The key steps of training for Mandarin Language are same to that of English Language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character based and please run ```tools/tune.py``` to find an optimal setting. +The key steps of training for Mandarin language are same to that of English language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, testing and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character-based and please run ```tools/tune.py``` to find an optimal setting. ## Trying Live Demo with Your Own Voice