Update modle_arcitecture.md

pull/1091/head
Jackwaterveg 4 years ago committed by GitHub
parent f35ee4053a
commit d905555c3f

@ -1,17 +1,25 @@
# Model Arcitecture # Model Arcitecture
The implemented arcitecure of Deepspeech2 online model is based on [Deepspeech2 model](https://arxiv.org/pdf/1512.02595.pdf) with some changes. The implemented arcitecure of Deepspeech2 online model is based on [Deepspeech2 model](https://arxiv.org/pdf/1512.02595.pdf) with some changes.
The figure of arcitecture is shown in ![image](../image/ds2onlineModel.png). The model is mainly composed of 2D convolution subsampling layer and single direction rnn layers.
The model is mainly composed of 2D convolution subsampling layer and single direction rnn layers. To illustrate the model implementation in detail, 5 parts is introduced. To illustrate the model implementation clearly, 5 parts is described in detail.
1. Feature Extraction. 1. Feature Extraction.
2. 2D Convolution subsampling layer. 2. 2D Convolution subsampling layer.
3. RNN layer with only forward direction. 3. RNN layer with only forward direction.
4. Softmax Layer. 4. Softmax Layer.
5. CTC Decoder. 5. CTC Decoder.
The arcitecture of the model is shown in Fig.1.
<p align="center">
<img src="../images/ds2onlineModel.png" width=800>
<br/>Fig.1 The Arcitecture of deepspeech2 online modle
</p>
# Feature Extraction # Feature Extraction
Three methods of feature extraction is implemented, which are linear, fbank and mfcc. Three methods of feature extraction is implemented, which are linear, fbank and mfcc.
For a single utterance $x^i$ sampled from the training set $S$, For a single utterance $x^i$ sampled from the training set $S$,
$ S= {(x^1,y^1),(x^2,y^2),...,(x^m,y^m)}$, where $y^i$ is the label correspodding to the ${x^i} $ S= {(x^1,y^1),(x^2,y^2),...,(x^m,y^m)}$, where $y^i$ is the label correspodding to the ${x^i}
# Backbone
The Backbone is composed of 2D Convolution subsampling layer.

Loading…
Cancel
Save