add more speech doc

4 years ago · fafdeac321
parent 7bbe1d66d2
commit fafdeac321
7 changed files with 42 additions and 11 deletions
--- a/docs/src/asr_postprocess.md
+++ b/docs/src/asr_postprocess.md
@ -1,8 +1,9 @@
 # ASR PostProcess

-* Text Corrector
-* Text Filter
-* Add Punctuation
+1. [Text Segmentation](text_front_end#text segmentation)
+2. Text Corrector
+3. Add Punctuation
+4. Text Filter



@ -10,6 +11,7 @@

 * [pycorrector](https://github.com/shibing624/pycorrector)
  本项目重点解决其中的谐音、混淆音、形似字错误、中文拼音全拼、语法错误带来的纠错任务。PS：[网友源码解读](https://zhuanlan.zhihu.com/p/138981644)
+* DeepCorrection [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d)  [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)



@ -88,12 +90,13 @@



-## Text Filter
+## Add Punctuation

-* 敏感词（黄暴、涉政、违法违禁等）
+* DeepCorrection [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d)  [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)



+## Text Filter

+* 敏感词（黄暴、涉政、违法违禁等）

-## Add Punctuation
--- a/docs/src/dataset.md
+++ b/docs/src/dataset.md
@ -0,0 +1,16 @@
+# Dataset
+
+## Text
+
+* [Tatoeba](https://tatoeba.org/cmn)
+
+  **Tatoeba is a collection of sentences and translations.** It's collaborative, open, free and even addictive. An open data initiative aimed at translation and speech recognition.
+
+
+
+## Speech
+
+* [Tatoeba](https://tatoeba.org/cmn)
+
+  **Tatoeba is a collection of sentences and translations.** It's collaborative, open, free and even addictive. An open data initiative aimed at translation and speech recognition.
+
--- a/docs/src/reference.md
+++ b/docs/src/reference.md
@ -1,3 +1,4 @@
 # Reference

 * [wenet](https://github.com/mobvoi/wenet)
+
--- a/docs/src/text_front_end.md
+++ b/docs/src/text_front_end.md
@ -1,5 +1,16 @@
 # Text Front End

+
+
+## Text Segmentation
+
+There are various libraries including some of the most popular ones like NLTK, Spacy, Stanford CoreNLP that that provide excellent, easy to use functions for sentence segmentation. 
+
+* https://github.com/bminixhofer/nnsplit
+* [DeepSegment](https://github.com/notAI-tech/deepsegment)  [blog](http://bpraneeth.com/projects/deepsegment) [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d)  [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)
+
+
+
 ## Text Normalization(文本正则)

 文本正则化 文本正则化主要是讲非标准词(NSW)进行转化，比如：  
--- a/tools/Makefile
+++ b/tools/Makefile
@ -1,4 +1,4 @@
-PYTHON:= python3.7
+PYTHON:= python3.8
 .PHONY: all clean

 all: virtualenv