pull/629/head
Hui Zhang 3 years ago
parent 30aba26693
commit 92381451fb

@ -11,7 +11,7 @@
## Features
See [feature list](doc/src/feature_list.md) for more information.
See [feature list](doc/src/feature_list.md) for more information.
## Setup

@ -179,7 +179,8 @@ class FeatureNormalizer(object):
wav_number += batch_size
if wav_number % 1000 == 0:
logger.info(f'process {wav_number} wavs,{all_number} frames.')
logger.info(
f'process {wav_number} wavs,{all_number} frames.')
self.cmvn_info = {
'mean_stat': list(all_mean_stat.tolist()),

@ -98,4 +98,4 @@
## Text Filter
* 敏感词(黄暴、涉政、违法违禁等)
* 敏感词(黄暴、涉政、违法违禁等)

@ -14,4 +14,3 @@ We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of L
| 8 | 6.95 X |
`utils/profile.sh` provides such a demo profiling tool, you can change it as need.

@ -48,4 +48,4 @@
## Zhuyin
* [Bopomofo](https://en.wikipedia.org/wiki/Bopomofo)
* [Zhuyin table](https://en.wikipedia.org/wiki/Zhuyin_table)
* [Zhuyin table](https://en.wikipedia.org/wiki/Zhuyin_table)

@ -18,4 +18,4 @@
### ASR Noise
* [asr-noises](https://github.com/speechio/asr-noises)
* [asr-noises](https://github.com/speechio/asr-noises)

@ -58,4 +58,4 @@
### Grapheme To Phoneme
* syallable
* phoneme
* phoneme

@ -83,4 +83,4 @@ Please notice that the released language models only contain Chinese simplified
```
build/bin/build_binary ./result/people2014corpus_words.arps ./result/people2014corpus_words.klm
```
```

@ -76,7 +76,7 @@ pip3 install textgrid
tg.read('file.TextGrid') # 'file.TextGrid' 是文件名
```
tg.tiers属性:
tg.tiers属性:
会把文件中的所有item打印出来, print(tg.tiers) 的结果如下:
```text
@ -86,7 +86,7 @@ pip3 install textgrid
Interval(1361.89250, 1362.01250, R),
Interval(1362.01250, 1362.13250, AY1),
Interval(1362.13250, 1362.16250, T),
...
]
)
@ -113,7 +113,7 @@ pip3 install textgrid
Interval 可以理解为时长
```
2. textgrid库中的对象
**IntervalTier** 对象:
@ -148,7 +148,7 @@ pip3 install textgrid
strict -- > 返回bool值, 表示是否严格TextGrid格式
```
**PointTier** 对象:
方法
@ -174,7 +174,7 @@ pip3 install textgrid
name 返回name
```
**Point** 对象:
支持比较大小, 支持加减运算
@ -185,7 +185,7 @@ pip3 install textgrid
time:
```
**Interval** 对象:
支持比较大小, 支持加减运算
@ -250,10 +250,9 @@ pip3 install textgrid
grids: --> 返回读取的grids的列表
```
## Reference
* https://zh.wikipedia.org/wiki/Praat%E8%AF%AD%E9%9F%B3%E5%AD%A6%E8%BD%AF%E4%BB%B6
* https://blog.csdn.net/duxin_csdn/article/details/88966295

@ -1,4 +1,3 @@
# Useful Tools
* [正则可视化和常用正则表达式](https://wangwl.net/static/projects/visualRegex/#)

@ -23,7 +23,7 @@ Therefore, procedures like stemming and lemmatization are not useful for Chinese
### Tokenization
**Tokenizing breaks up text data into shorter pre-set strings**, which help build context and meaning for the machine learning model.
**Tokenizing breaks up text data into shorter pre-set strings**, which help build context and meaning for the machine learning model.
These “tags” label the part of speech. There are 24 part of speech tags and 4 proper name category labels in the `**jieba**` packages existing dictionary.
@ -31,7 +31,7 @@ These “tags” label the part of speech. There are 24 part of speech tags and
### Stop Words
In NLP, **stop words are “meaningless” words** that make the data too noisy or ambiguous.
In NLP, **stop words are “meaningless” words** that make the data too noisy or ambiguous.
Instead of manually removing them, you could import the `**stopwordsiso**` package for a full list of Chinese stop words. More information can be found [here](https://pypi.org/project/stopwordsiso/). And with this, we can easily create code to filter out any stop words in large text data.
@ -188,4 +188,4 @@ TN: 基于规则的方法
## Reference
* [Text Front End](https://slyne.github.io/%E5%85%AC%E5%BC%80%E8%AF%BE/2020/10/03/TTS1/)
* [Chinese Natural Language (Pre)processing: An Introduction](https://towardsdatascience.com/chinese-natural-language-pre-processing-an-introduction-995d16c2705f)
* [Beginners Guide to Sentiment Analysis for Simplified Chinese using SnowNLP](https://towardsdatascience.com/beginners-guide-to-sentiment-analysis-for-simplified-chinese-using-snownlp-ce88a8407efb)
* [Beginners Guide to Sentiment Analysis for Simplified Chinese using SnowNLP](https://towardsdatascience.com/beginners-guide-to-sentiment-analysis-for-simplified-chinese-using-snownlp-ce88a8407efb)

@ -1,5 +1,6 @@
coverage
pre-commit
pybind11
resampy==0.2.2
scipy==1.2.1
sentencepiece
@ -7,7 +8,6 @@ snakeviz
SoundFile==0.9.0.post1
sox
tensorboardX
textgrid
typeguard
yacs
pybind11
textgrid

Loading…
Cancel
Save