From cc5e4203317c7870d0743c8bc1d9497b05866311 Mon Sep 17 00:00:00 2001 From: Hu Weiwei Date: Wed, 15 Nov 2017 02:43:54 -0600 Subject: [PATCH] fix typo --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 6c473b69..ca146926 100644 --- a/README.md +++ b/README.md @@ -187,7 +187,7 @@ Six optional augmentation components are provided to be selected, configured and - Noise Perturbation (need background noise audio files) - Impulse Response (need impulse audio files) -In order to inform the trainer of what augmentation components are needed and what their processing orders are, it is required to prepare in advance a *augmentation configuration file* in [JSON](http://www.json.org/) format. For example: +In order to inform the trainer of what augmentation components are needed and what their processing orders are, it is required to prepare in advance an *augmentation configuration file* in [JSON](http://www.json.org/) format. For example: ``` [{ @@ -226,7 +226,7 @@ If you wish to train your own better language model, please refer to [KenLM](htt #### English LM -The English corpus is from the [Common Crawl Repository](http://commoncrawl.org) and you can download it from [statmt](http://data.statmt.org/ngrams/deduped_en). We use part en.00 to train our English languge model. There are some preprocessing steps before training: +The English corpus is from the [Common Crawl Repository](http://commoncrawl.org) and you can download it from [statmt](http://data.statmt.org/ngrams/deduped_en). We use part en.00 to train our English language model. There are some preprocessing steps before training: * Characters not in \[A-Za-z0-9\s'\] (\s represents whitespace characters) are removed and Arabic numbers are converted to English numbers like 1000 to one thousand. * Repeated whitespace characters are squeezed to one and the beginning whitespace characters are removed. Notice that all transcriptions are lowercase, so all characters are converted to lowercase.