speech text process docs (#607)

* add more speech doc * fix doc path and mergify * format doc
4 years ago · a12b16787d
parent 7bbe1d66d2
commit a12b16787d
20 changed files with 40 additions and 14 deletions
--- a/.mergify.yml
+++ b/.mergify.yml
@ -47,19 +47,19 @@ pull_request_rules:
        add: ["README"]
  - name: "auto add label=Documentation"
    conditions:
-      - files~=^docs/
+      - files~=^doc/
    actions:
      label:
        add: ["Documentation"]
  - name: "auto add label=CI"
    conditions:
-      - files~=^(.circleci/|ci/|.github/|.travis.yml)
+      - files~=^(.circleci/|ci/|.github/|.travis.yml|.travis|env.sh)
    actions:
      label:
        add: ["CI"]
  - name: "auto add label=Installation"
    conditions:
-      - files~=^(tools/|setup.py|setup.sh|env.sh|.travis)
+      - files~=^(tools/|setup.py|setup.sh)
    actions:
      label:
        add: ["Installation"]
--- a/docs/images/multi_gpu_speedup.png
+++ b/docs/images/multi_gpu_speedup.png
--- a/docs/images/prosody.jpeg
+++ b/docs/images/prosody.jpeg
--- a/docs/images/tuning_error_surface.png
+++ b/docs/images/tuning_error_surface.png
--- a/docs/src/asr_postprocess.md
+++ b/docs/src/asr_postprocess.md
@ -1,8 +1,9 @@
 # ASR PostProcess
-* Text Corrector
+1. [Text Segmentation](text_front_end#text segmentation)
-* Text Filter
+2. Text Corrector
-* Add Punctuation
+3. Add Punctuation
 4. Text Filter
@ -10,6 +11,7 @@
 * [pycorrector](https://github.com/shibing624/pycorrector)
  本项目重点解决其中的谐音、混淆音、形似字错误、中文拼音全拼、语法错误带来的纠错任务。PS：[网友源码解读](https://zhuanlan.zhihu.com/p/138981644)
 * DeepCorrection [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d)  [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)
@ -88,12 +90,12 @@
-## Text Filter
+## Add Punctuation
 * 敏感词（黄暴、涉政、违法违禁等）
 * DeepCorrection [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d)  [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)
 ## Text Filter
-## Add Punctuation
+* 敏感词（黄暴、涉政、违法违禁等）
--- a/docs/src/augmentation.md
+++ b/docs/src/augmentation.md
--- a/docs/src/benchmark.md
+++ b/docs/src/benchmark.md
--- a/docs/src/chinese_syllable.md
+++ b/docs/src/chinese_syllable.md
--- a/docs/src/data_preparation.md
+++ b/docs/src/data_preparation.md
--- a/doc/src/dataset.md
+++ b/doc/src/dataset.md
@ -0,0 +1,15 @@
 # Dataset
 ## Text
 * [Tatoeba](https://tatoeba.org/cmn)
  **Tatoeba is a collection of sentences and translations.** It's collaborative, open, free and even addictive. An open data initiative aimed at translation and speech recognition.
 ## Speech
 * [Tatoeba](https://tatoeba.org/cmn)
  **Tatoeba is a collection of sentences and translations.** It's collaborative, open, free and even addictive. An open data initiative aimed at translation and speech recognition.
--- a/docs/src/faq.md
+++ b/docs/src/faq.md
--- a/docs/src/getting_started.md
+++ b/docs/src/getting_started.md
--- a/docs/src/install.md
+++ b/docs/src/install.md
--- a/docs/src/ngram_lm.md
+++ b/docs/src/ngram_lm.md
@ -86,5 +86,3 @@ Please notice that the released language models only contain Chinese simplified
   ```
   build/bin/build_binary ./result/people2014corpus_words.arps ./result/people2014corpus_words.klm
   ```
--- a/docs/src/reference.md
+++ b/docs/src/reference.md
--- a/docs/src/released_model.md
+++ b/docs/src/released_model.md
--- a/docs/src/server.md
+++ b/docs/src/server.md
--- a/docs/src/speech_synthesis.md
+++ b/docs/src/speech_synthesis.md
--- a/docs/src/text_front_end.md
+++ b/docs/src/text_front_end.md
@ -1,5 +1,16 @@
 # Text Front End
 ## Text Segmentation
 There are various libraries including some of the most popular ones like NLTK, Spacy, Stanford CoreNLP that that provide excellent, easy to use functions for sentence segmentation.
 * https://github.com/bminixhofer/nnsplit
 * [DeepSegment](https://github.com/notAI-tech/deepsegment)  [blog](http://bpraneeth.com/projects/deepsegment) [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d)  [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)
 ## Text Normalization(文本正则)
 文本正则化 文本正则化主要是讲非标准词(NSW)进行转化，比如：  
--- a/tools/Makefile
+++ b/tools/Makefile
@ -1,4 +1,4 @@
-PYTHON:= python3.7
+PYTHON:= python3.8
 .PHONY: all clean
 all: virtualenv