From d983f8d8b3f66d24be0b3028a68dd46d6aa3cbd5 Mon Sep 17 00:00:00 2001 From: Jackwaterveg <87408988+Jackwaterveg@users.noreply.github.com> Date: Tue, 7 Sep 2021 17:15:36 +0800 Subject: [PATCH] Create modle_arcitecture.md --- doc/src/modle_arcitecture.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 doc/src/modle_arcitecture.md diff --git a/doc/src/modle_arcitecture.md b/doc/src/modle_arcitecture.md new file mode 100644 index 00000000..47b75f0d --- /dev/null +++ b/doc/src/modle_arcitecture.md @@ -0,0 +1,17 @@ +# Model Arcitecture + + The implemented arcitecure of Deepspeech2 online model is based on [Deepspeech2 model](https://arxiv.org/pdf/1512.02595.pdf) with some changes. + The figure of arcitecture is shown in ![image](../image/ds2onlineModel.png). + The model is mainly composed of 2D convolution subsampling layer and single direction rnn layers. To illustrate the model implementation in detail, 5 parts is introduced. + 1. Feature Extraction. + 2. 2D Convolution subsampling layer. + 3. RNN layer with only forward direction. + 4. Softmax Layer. + 5. CTC Decoder. + + + # Feature Extraction + + Three methods of feature extraction is implemented, which are linear, fbank and mfcc. + For a single utterance $x^i$ sampled from the training set $S$, + $ S= {(x^1,y^1),(x^2,y^2),...,(x^m,y^m)}$, where $y^i$ is the label correspodding to the ${x^i}