diff --git a/NLP通用框架BERT项目实战/assets/1609746327202.png b/NLP通用框架BERT项目实战/assets/1609746327202.png
new file mode 100644
index 0000000..5c6efa2
Binary files /dev/null and b/NLP通用框架BERT项目实战/assets/1609746327202.png differ
diff --git a/NLP通用框架BERT项目实战/assets/1609746644923.png b/NLP通用框架BERT项目实战/assets/1609746644923.png
new file mode 100644
index 0000000..692c3fa
Binary files /dev/null and b/NLP通用框架BERT项目实战/assets/1609746644923.png differ
diff --git a/NLP通用框架BERT项目实战/assets/1609746664324.png b/NLP通用框架BERT项目实战/assets/1609746664324.png
new file mode 100644
index 0000000..11ed442
Binary files /dev/null and b/NLP通用框架BERT项目实战/assets/1609746664324.png differ
diff --git a/NLP通用框架BERT项目实战/assets/1609746694985.png b/NLP通用框架BERT项目实战/assets/1609746694985.png
new file mode 100644
index 0000000..e51763f
Binary files /dev/null and b/NLP通用框架BERT项目实战/assets/1609746694985.png differ
diff --git a/NLP通用框架BERT项目实战/第一章——NLP通用框架BERT原理解读.md b/NLP通用框架BERT项目实战/第一章——NLP通用框架BERT原理解读.md
index 8e9bce9..b7380a1 100644
--- a/NLP通用框架BERT项目实战/第一章——NLP通用框架BERT原理解读.md
+++ b/NLP通用框架BERT项目实战/第一章——NLP通用框架BERT原理解读.md
@@ -120,3 +120,27 @@ Multi-Head架构图如下
 
 > 由于输入输出都是向量，也就是可以堆叠更多层，计算方法都是相同的，只是增加了多层。
 
+
+
+#### 位置编码与多层堆叠
+
+> 位置信息：在self-attention中，每个词都会考虑整个序列的加权，所以其出现位置并不会对结果产生什么影响，相当于放哪都无所谓，但是这跟实际就有些不符合了，我们希望模型能对位置有额外的认识。
+
+![1609746327202](assets/1609746327202.png)
+
+> POSITIONAL ENCODING：将余弦和正弦的周期表达信号，当作位置信息。
+
+**Add与Normalize**
+
+![1609746644923](assets/1609746644923.png)
+
+- 归一化：![1609746664324](assets/1609746664324.png)
+
+  > Batch Size：将其一行，让其均值为0，标准差为1
+  >
+  > Layer：让其一列，变成均值为0，标准差为1
+
+- 连接：基本的残差连接方式![1609746694985](assets/1609746694985.png)
+
+  > 残差连接：X如果处理完残差变小，则使用该X，如果残差反而大了，则使用原有的X。也就是堆叠的层数中，我们保证了堆叠的过程中，结果一定不会比原来差。
+