|
|
@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cells": [
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"# 使用LSTM进行情感分析"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### 深度学习在自然语言处理中的应用\n",
|
|
|
|
|
|
|
|
"自然语言处理是教会机器如何去处理或者读懂人类语言的系统,主要应用领域:\n",
|
|
|
|
|
|
|
|
"\n",
|
|
|
|
|
|
|
|
"* 对话系统 - 聊天机器人(小冰)\n",
|
|
|
|
|
|
|
|
"* 情感分析 - 对一段文本进行情感识别(我们现在做)\n",
|
|
|
|
|
|
|
|
"* 图文映射 - CNN和RNN的融合\n",
|
|
|
|
|
|
|
|
"* 机器翻译 - 将一种语言翻译成另一种语言\n",
|
|
|
|
|
|
|
|
"* 语音识别 - 将语音识别成文字,如王者荣耀"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"## 词向量模型\n",
|
|
|
|
|
|
|
|
"计算机只认识数字!\n",
|
|
|
|
|
|
|
|
"<img src=\"assets/20210112092212.png\" width=\"100%\">\n",
|
|
|
|
|
|
|
|
"我们可以将一句话中的每个词都转换成一个向量\n",
|
|
|
|
|
|
|
|
"<img src=\"assets/20210112092241.png\" width=\"100%\">\n",
|
|
|
|
|
|
|
|
"它们的向量维度是一致的"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"词向量是具有空间一样的,并不是简单的映射!例如,我们希望单词“love”和“adore”这两个词在向量空间中是有一定的相关性的,因为他们有类似的定义,他们都在类似的上下文中使用。单词的向量表示也被称之为词嵌入。\n",
|
|
|
|
|
|
|
|
"<img src=\"assets/20210112095444.png\" width=\"50%\">\n",
|
|
|
|
|
|
|
|
"word2vec构建的词向量正如上图,相同含义的词在高维空间上是接近的,而不同含义的词差别很远。"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"source": [
|
|
|
|
|
|
|
|
"### Word2Vec\n",
|
|
|
|
|
|
|
|
"为了去得到这些词嵌入,我们使用一个非常厉害的模型\"Word2vec\"。简单的说,这个模型根据上下文的语境来推断出毎个词的词向量。如果两个个词在上下文的语境中,可以被互相替换,那么这两个词的距离就非常近。在自然语言中,上下文的语境对分析词语的意义是非常重要的。比如,之前我们提到的\"adore\"和Tove\"这两个词,我们观察如下上下文的语境。\n",
|
|
|
|
|
|
|
|
"<img src=\"assets/20210112100552.png\" width=\"50%\">\n",
|
|
|
|
|
|
|
|
"从句子中我们可以看到,这两个词通常在句子中是表现积极的,而且-般比名词或者名词组合要好。这也说明了,这两个词可以被互相替换,他们的意思是非常相近的。对于句子的语法结构分析,上下文语境也是非常重要的。所有,这个模型的作用就是从一大堆句子(以 Wikipedia为例)中为毎个独一无二的单词进行建模,并且输出一个唯一的向量。word2vec模型的输出被称为一个嵌入矩阵\n",
|
|
|
|
|
|
|
|
"<img src=\"assets/20210112100616.png\" width=\"70%\">\n",
|
|
|
|
|
|
|
|
"这个嵌入矩阵包含训练集中每个词的一个向量。传统来讲,这个嵌入矩阵中的词向量数据会很大。"
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
|
|
|
"metadata": {},
|
|
|
|
|
|
|
|
"outputs": [],
|
|
|
|
|
|
|
|
"source": []
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
],
|
|
|
|
|
|
|
|
"metadata": {
|
|
|
|
|
|
|
|
"kernelspec": {
|
|
|
|
|
|
|
|
"display_name": "Python 3",
|
|
|
|
|
|
|
|
"language": "python",
|
|
|
|
|
|
|
|
"name": "python3"
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"language_info": {
|
|
|
|
|
|
|
|
"codemirror_mode": {
|
|
|
|
|
|
|
|
"name": "ipython",
|
|
|
|
|
|
|
|
"version": 3
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"file_extension": ".py",
|
|
|
|
|
|
|
|
"mimetype": "text/x-python",
|
|
|
|
|
|
|
|
"name": "python",
|
|
|
|
|
|
|
|
"nbconvert_exporter": "python",
|
|
|
|
|
|
|
|
"pygments_lexer": "ipython3",
|
|
|
|
|
|
|
|
"version": "3.7.3"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
},
|
|
|
|
|
|
|
|
"nbformat": 4,
|
|
|
|
|
|
|
|
"nbformat_minor": 2
|
|
|
|
|
|
|
|
}
|