Add. GPT-2里的前馈神经网络

1 year ago · 87cf7f340e
parent 39aacf2796
commit 87cf7f340e
1 changed files with 27 additions and 0 deletions
--- a/人人都能看懂的Transformer/第七章——前馈神经网络.md
+++ b/人人都能看懂的Transformer/第七章——前馈神经网络.md
@ -2,7 +2,34 @@
 <img src="../assets/image-20240424204837275.png" alt="前馈神经网络" style="zoom:50%;" />
 ### 前言
 在[A Neural Network Playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.53882&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)这个网址玩过的应该对神经网络有了基本的了解，大部分情况下，随着层数跟神经元的增加，结果一般也会变好，即正相关，但同时也意味着更多的资源投入等。我们对神经网络这块讲的会比较简单，因为更底层的原理短时间无法讲明白，大家了解稍微深一点即可。
 ### GPT-2里的前馈神经网络
 [源代码](https://github.com/openai/gpt-2/blob/master/src/model.py)如下。需要看的点击前面链接跳转
 ~~~python
 def conv1d(x, scope, nf, *, w_init_stdev=0.02):
    with tf.variable_scope(scope):
        *start, nx = shape_list(x)
        w = tf.get_variable('w', [1, nx, nf], initializer=tf.random_normal_initializer(stddev=w_init_stdev))  # 训练中更新的权重w
        b = tf.get_variable('b', [nf], initializer=tf.constant_initializer(0))  # 训练中更新的偏值项b
        c = tf.reshape(tf.matmul(tf.reshape(x, [-1, nx]), tf.reshape(w, [-1, nf]))+b, start+[nf])
        return c
 def mlp(x, scope, n_state, *, hparams):
    with tf.variable_scope(scope):
        nx = x.shape[-1].value
        h = gelu(conv1d(x, 'c_fc', n_state))  # 第一层是一个线性变换，后面跟着一个GELU激活函数
        h2 = conv1d(h, 'c_proj', nx)  # 二层是另一个线性变换，将数据从隐藏层的维度映射回原始维度
        return h2
 ~~~
 可以看到上面是非常简单的两层线性变换，而且没有其它隐藏层。
 FFNN 在 Transformer 中的作用是为了引入非线性并增加模型的表达能力。多头注意力机制虽然能够捕捉输入序列中的长距离依赖关系，但它本身是一个线性操作。FFNN 通过在注意力机制之后添加非线性变换，使得模型能够学习更复杂的特征表示。