|
|
|
@ -48,7 +48,7 @@ def mlp(x, scope, n_state, *, hparams):
|
|
|
|
|
|
|
|
|
|
我们的目标是训练一个模型来预测输出`y`,给定输入`x`。我们的模型是一个简单的单变量线性回归模型:`y_pred = w * x`。
|
|
|
|
|
|
|
|
|
|
初始化权重 `w` 为 0.5,学习率 `lr` 为 0.01。我们将通过编写代码模拟进行3次迭代的权重更新。
|
|
|
|
|
初始化权重 `w` 为 0.5,学习率 `lr` 为 0.1。我们将通过编写代码模拟进行3次迭代的权重更新。
|
|
|
|
|
|
|
|
|
|
#### 迭代 1:
|
|
|
|
|
|
|
|
|
@ -69,7 +69,7 @@ def mlp(x, scope, n_state, *, hparams):
|
|
|
|
|
3. **反向传播**:计算损失关于权重w的梯度
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
dloss/dw = 2 * (y_pred - y) * x = 2 * (0.5 - 2) * 1 = -3
|
|
|
|
|
dloss/dw = 2 * (y_pred - y) * x = 2 * (0.5 - 2) * 1 = -3.0
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
> 我们需要通过反向传播来更新权重。
|
|
|
|
@ -77,7 +77,7 @@ def mlp(x, scope, n_state, *, hparams):
|
|
|
|
|
4. **更新权重**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
w = w - lr * dloss/dw = 0.5 - 0.01 * (-3) = 0.53
|
|
|
|
|
w = w - lr * dloss_dw = 0.8 - 0.1 * -3.0 = 0.8
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### 迭代 2:
|
|
|
|
@ -87,27 +87,27 @@ def mlp(x, scope, n_state, *, hparams):
|
|
|
|
|
1. **前向传播**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
y_pred = w * x = 0.53 * 1 = 0.53
|
|
|
|
|
y_pred = w * x = 0.8 * 1 = 0.8
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
2. **计算损失**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
loss = (0.53 - 2)^2 = 2.14
|
|
|
|
|
loss = (y_pred - y)^2 = (0.8 - 2)^2 = 1.44
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
> 可以看到loss值下降了,2.25下降为2.14,但是还不够,我们的目标是无限接近0。
|
|
|
|
|
> 可以看到loss值下降了,2.25下降为1.44,但是还不够,我们的目标是无限接近0。
|
|
|
|
|
|
|
|
|
|
3. **反向传播**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
dloss/dw = 2 * (0.53 - 2) * 1 = -2.94
|
|
|
|
|
dloss/dw = 2 * (y_pred - y) * x = 2 * (0.8 - 2) * 1 = -2.4
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
4. **更新权重**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
w = 0.53 - 0.01 * (-2.94) = 0.56
|
|
|
|
|
w = w - lr * dloss_dw = 0.8 - 0.1 * -2.4 = 1.04
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
#### 迭代 3:
|
|
|
|
@ -117,15 +117,15 @@ def mlp(x, scope, n_state, *, hparams):
|
|
|
|
|
1. **前向传播**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
y_pred = w * x = 0.56 * 1 = 0.56
|
|
|
|
|
y_pred = w * x = 1.04 * 1 = 1.04
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
> 预测结果从0.5到0.53再到0.56,逐步接近2这个正确的值。
|
|
|
|
|
> 预测结果从0.5到0.8再到1.04,逐步接近2这个正确的值。
|
|
|
|
|
|
|
|
|
|
2. **计算损失**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
loss = (0.56 - 2)^2 = 2.07
|
|
|
|
|
loss = (y_pred - y)^2 = (1.04 - 2)^2 = 0.9216
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
> 可以看到loss又一次下降了,也就是只要我们反复循环,那么最终的loss值,一定能无限接近于0。
|
|
|
|
@ -133,16 +133,16 @@ def mlp(x, scope, n_state, *, hparams):
|
|
|
|
|
3. **反向传播**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
dloss/dw = 2 * (0.56 - 2) * 1 = -2.88
|
|
|
|
|
dloss/dw = 2 * (y_pred - y) * x = 2 * (1.04 - 2) * 1 = -1.92
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
4. **更新权重**:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
w = 0.56 - 0.01 * (-2.88) = 0.59
|
|
|
|
|
w = w - lr * dloss_dw = 1.04 - 0.1 * -1.92 = 1.232
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
在这个非常简单的例子中,我们可以看到权重`w`在每次迭代后都在逐渐增加,以减少预测值`y_pred`和真实值`y`之间的差异。在实际应用中,我们会使用所有样本来计算损失和梯度,并可能使用更复杂的网络结构和优化算法。但这个例子展示了神经网络权重更新的基本原理。
|
|
|
|
|
在这个非常简单的例子中,我们可以看到权重`w`在每次迭代后都在逐渐增加(直到loss无限接近0),以减少预测值`y_pred`和真实值`y`之间的差异。在实际应用中,我们会使用所有样本来计算损失和梯度,并可能使用更复杂的网络结构和优化算法。但这个例子展示了神经网络权重更新的基本原理。
|
|
|
|
|
|
|
|
|
|
**简单的神经网络等价于线性回归**,想要深入了解的可以看[线性回归原理](https://github.com/ben1234560/AiLearning-Theory-Applying/blob/53ad238b5b7dbb5c39520401de2f10208825e4f9/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%AE%97%E6%B3%95%E5%8E%9F%E7%90%86%E5%8F%8A%E6%8E%A8%E5%AF%BC/%E7%AC%AC%E4%B8%80%E7%AB%A0%E2%80%94%E2%80%94%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92%E5%8E%9F%E7%90%86.md)
|
|
|
|
|
|
|
|
|
|