speech text process docs (#607)
* add more speech doc * fix doc path and mergify * format docpull/610/head
parent
7bbe1d66d2
commit
a12b16787d
Before Width: | Height: | Size: 206 KiB After Width: | Height: | Size: 206 KiB |
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 47 KiB |
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 108 KiB |
@ -0,0 +1,15 @@
|
||||
# Dataset
|
||||
|
||||
## Text
|
||||
|
||||
* [Tatoeba](https://tatoeba.org/cmn)
|
||||
|
||||
**Tatoeba is a collection of sentences and translations.** It's collaborative, open, free and even addictive. An open data initiative aimed at translation and speech recognition.
|
||||
|
||||
|
||||
|
||||
## Speech
|
||||
|
||||
* [Tatoeba](https://tatoeba.org/cmn)
|
||||
|
||||
**Tatoeba is a collection of sentences and translations.** It's collaborative, open, free and even addictive. An open data initiative aimed at translation and speech recognition.
|
@ -1,5 +1,16 @@
|
||||
# Text Front End
|
||||
|
||||
|
||||
|
||||
## Text Segmentation
|
||||
|
||||
There are various libraries including some of the most popular ones like NLTK, Spacy, Stanford CoreNLP that that provide excellent, easy to use functions for sentence segmentation.
|
||||
|
||||
* https://github.com/bminixhofer/nnsplit
|
||||
* [DeepSegment](https://github.com/notAI-tech/deepsegment) [blog](http://bpraneeth.com/projects/deepsegment) [1](https://praneethbedapudi.medium.com/deepcorrection-1-sentence-segmentation-of-unpunctuated-text-a1dbc0db4e98) [2](https://praneethbedapudi.medium.com/deepcorrection2-automatic-punctuation-restoration-ac4a837d92d9) [3](https://praneethbedapudi.medium.com/deepcorrection-3-spell-correction-and-simple-grammar-correction-d033a52bc11d) [4](https://praneethbedapudi.medium.com/deepsegment-2-0-multilingual-text-segmentation-with-vector-alignment-fd76ce62194f)
|
||||
|
||||
|
||||
|
||||
## Text Normalization(文本正则)
|
||||
|
||||
文本正则化 文本正则化主要是讲非标准词(NSW)进行转化,比如:
|
Loading…
Reference in new issue