format

3 years ago · 92381451fb
parent 30aba26693
commit 92381451fb
12 changed files with 20 additions and 22 deletions
--- a/README.md
+++ b/README.md
@ -11,7 +11,7 @@

 ## Features

- See [feature list](doc/src/feature_list.md) for more information. 
+ See [feature list](doc/src/feature_list.md) for more information.

 ## Setup

--- a/deepspeech/frontend/normalizer.py
+++ b/deepspeech/frontend/normalizer.py
@ -179,7 +179,8 @@ class FeatureNormalizer(object):
                wav_number += batch_size

                if wav_number % 1000 == 0:
-                    logger.info(f'process {wav_number} wavs,{all_number} frames.')
+                    logger.info(
+                        f'process {wav_number} wavs,{all_number} frames.')

        self.cmvn_info = {
            'mean_stat': list(all_mean_stat.tolist()),
--- a/doc/src/asr_text_backend.md
+++ b/doc/src/asr_text_backend.md
@ -98,4 +98,4 @@

 ## Text Filter

-* 敏感词（黄暴、涉政、违法违禁等）
+* 敏感词（黄暴、涉政、违法违禁等）
--- a/doc/src/benchmark.md
+++ b/doc/src/benchmark.md
@ -14,4 +14,3 @@ We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of L
 | 8         | 6.95 X |

 `utils/profile.sh` provides such a demo profiling tool, you can change it as need.
-
--- a/doc/src/chinese_syllable.md
+++ b/doc/src/chinese_syllable.md
@ -48,4 +48,4 @@
 ## Zhuyin

 * [Bopomofo](https://en.wikipedia.org/wiki/Bopomofo)
-* [Zhuyin table](https://en.wikipedia.org/wiki/Zhuyin_table)
+* [Zhuyin table](https://en.wikipedia.org/wiki/Zhuyin_table)
--- a/doc/src/dataset.md
+++ b/doc/src/dataset.md
@ -18,4 +18,4 @@

 ### ASR Noise

-* [asr-noises](https://github.com/speechio/asr-noises)
+* [asr-noises](https://github.com/speechio/asr-noises)
--- a/doc/src/feature_list.md
+++ b/doc/src/feature_list.md
@ -58,4 +58,4 @@
 ### Grapheme To Phoneme

 * syallable
-* phoneme
+* phoneme
--- a/doc/src/ngram_lm.md
+++ b/doc/src/ngram_lm.md
@ -83,4 +83,4 @@ Please notice that the released language models only contain Chinese simplified

   ```
   build/bin/build_binary ./result/people2014corpus_words.arps ./result/people2014corpus_words.klm
-   ```
+   ```
--- a/doc/src/praat_textgrid.md
+++ b/doc/src/praat_textgrid.md
@ -76,7 +76,7 @@ pip3 install textgrid
   tg.read('file.TextGrid')  # 'file.TextGrid' 是文件名
   ```

-   tg.tiers属性: 
+   tg.tiers属性:
   会把文件中的所有item打印出来, print(tg.tiers) 的结果如下:

   ```text
@ -86,7 +86,7 @@ pip3 install textgrid
           Interval(1361.89250, 1362.01250, R),
           Interval(1362.01250, 1362.13250, AY1),
           Interval(1362.13250, 1362.16250, T),
-   
+
   ...
           ]
       )
@ -113,7 +113,7 @@ pip3 install textgrid
   Interval  可以理解为时长
   ```

-   
+

 2. textgrid库中的对象
   **IntervalTier** 对象:
@ -148,7 +148,7 @@ pip3 install textgrid
   strict  -- > 返回bool值, 表示是否严格TextGrid格式
   ```

-         
+     

   **PointTier** 对象:
   方法
@ -174,7 +174,7 @@ pip3 install textgrid
   name    返回name
   ```

-   
+

   **Point** 对象:
       支持比较大小, 支持加减运算
@ -185,7 +185,7 @@ pip3 install textgrid
   time:
   ```

-          
+     

   **Interval** 对象:
       支持比较大小, 支持加减运算
@ -250,10 +250,9 @@ pip3 install textgrid
   grids:  --> 返回读取的grids的列表
   ```

-   
+

 ## Reference

 * https://zh.wikipedia.org/wiki/Praat%E8%AF%AD%E9%9F%B3%E5%AD%A6%E8%BD%AF%E4%BB%B6
 * https://blog.csdn.net/duxin_csdn/article/details/88966295
-
--- a/doc/src/tools.md
+++ b/doc/src/tools.md
@ -1,4 +1,3 @@
 # Useful Tools

 * [正则可视化和常用正则表达式](https://wangwl.net/static/projects/visualRegex/#)
-
--- a/doc/src/tts_text_front_end.md
+++ b/doc/src/tts_text_front_end.md
@ -23,7 +23,7 @@ Therefore, procedures like stemming and lemmatization are not useful for Chinese

 ### Tokenization

-**Tokenizing breaks up text data into shorter pre-set strings**, which help build context and meaning for the machine learning model.   
+**Tokenizing breaks up text data into shorter pre-set strings**, which help build context and meaning for the machine learning model.  

 These “tags” label the part of speech. There are 24 part of speech tags and 4 proper name category labels in the `**jieba**` package’s existing dictionary.

@ -31,7 +31,7 @@ These “tags” label the part of speech. There are 24 part of speech tags and

 ### Stop Words

-In NLP, **stop words are “meaningless” words** that make the data too noisy or ambiguous. 
+In NLP, **stop words are “meaningless” words** that make the data too noisy or ambiguous.

 Instead of manually removing them, you could import the `**stopwordsiso**` package for a full list of Chinese stop words. More information can be found [here](https://pypi.org/project/stopwordsiso/). And with this, we can easily create code to filter out any stop words in large text data.

@ -188,4 +188,4 @@ TN: 基于规则的方法
 ## Reference
 * [Text Front End](https://slyne.github.io/%E5%85%AC%E5%BC%80%E8%AF%BE/2020/10/03/TTS1/)
 * [Chinese Natural Language (Pre)processing: An Introduction](https://towardsdatascience.com/chinese-natural-language-pre-processing-an-introduction-995d16c2705f)
-* [Beginner’s Guide to Sentiment Analysis for Simplified Chinese using SnowNLP](https://towardsdatascience.com/beginners-guide-to-sentiment-analysis-for-simplified-chinese-using-snownlp-ce88a8407efb)
+* [Beginner’s Guide to Sentiment Analysis for Simplified Chinese using SnowNLP](https://towardsdatascience.com/beginners-guide-to-sentiment-analysis-for-simplified-chinese-using-snownlp-ce88a8407efb)
--- a/requirements.txt
+++ b/requirements.txt
@ -1,5 +1,6 @@
 coverage
 pre-commit
+pybind11
 resampy==0.2.2
 scipy==1.2.1
 sentencepiece
@ -7,7 +8,6 @@ snakeviz
 SoundFile==0.9.0.post1
 sox
 tensorboardX
+textgrid
 typeguard
 yacs
-pybind11
-textgrid