You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ben.guo
df252a9622
|
6 months ago | |
---|---|---|
.. | ||
code | 7 months ago | |
.DS_Store | 7 months ago | |
README.md | 6 months ago | |
第一章——Transformer网络架构.md | 6 months ago | |
第七章——前馈神经网络.md | 6 months ago | |
第三章——位置编码.md | 7 months ago | |
第二章——文字向量化.md | 7 months ago | |
第五章——多头注意力机制——全流程.md | 7 months ago | |
第八章——最后的输出.md | 6 months ago | |
第六章——数值缩放.md | 7 months ago | |
第四章——多头注意力机制——QK矩阵相乘.md | 6 months ago | |
训练和推理的区别(选修).md | 7 months ago |
README.md
README
深度解析Transformer(大模型场景),提供图、代码等,力求每个人都能吃透它。
有疑惑的地方欢迎issue或邮件我,😀Enjoin!
Attention Is All You Need 论文地址