From 2ab6d7012e0cf20706daf5ec156c26c5f6b73eb1 Mon Sep 17 00:00:00 2001 From: "ben.guo" <909336740@qq.com> Date: Thu, 25 Apr 2024 18:09:34 +0800 Subject: [PATCH] =?UTF-8?q?Add.=20=E5=A2=9E=E5=8A=A0=E7=AC=AC=E4=B8=80?= =?UTF-8?q?=E7=AB=A0=E7=BB=93=E6=9D=9F=E8=AF=AD?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .DS_Store | Bin 10244 -> 10244 bytes .../第一章——Transformer网络架构.md | 4 ++++ 2 files changed, 4 insertions(+) diff --git a/.DS_Store b/.DS_Store index 1fbb0fd6d87b3e272a2580ceb629eb602a30aac8..7b781ddd5004bce3724fe480dfc01c151db7fc39 100644 GIT binary patch delta 20 ccmZn(XbISmDlmDMaK_|!q7IvR1;2{{09s!NN&o-= delta 24 gcmZn(XbISmD!|A-dAD#hBgf>&qTZW%1;2{{0BK(cPXGV_ diff --git a/人人都能看懂的Transformer/第一章——Transformer网络架构.md b/人人都能看懂的Transformer/第一章——Transformer网络架构.md index 52b60df..f30c44f 100644 --- a/人人都能看懂的Transformer/第一章——Transformer网络架构.md +++ b/人人都能看懂的Transformer/第一章——Transformer网络架构.md @@ -182,3 +182,7 @@ Add & Norm的过程可以理解为相同位置元素相加,再做层归一化 > WHY:归一化到0-1区间,便于比较和处理。将注意力分数转换为概率分布。 可以简单理解为,前面输出的概率,会被转化成0-1的区间进行输出。 + + + +至此,你已经对整个Transformer有的整体了解,我们已经是熟悉transformer的人了🎉🎉🎉