|
|
|
@ -36,10 +36,61 @@
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "code",
|
|
|
|
|
"execution_count": null,
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"## 皮尔逊相关系数"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### 连续变量的相关分析\n",
|
|
|
|
|
"<ul>\n",
|
|
|
|
|
" <li>连续变量即数据变量,它的取值之间可以比较大小,可以用加减法计算出差异的大小。\n",
|
|
|
|
|
" <li>如“年龄”、“收入”、“成绩\"等变量当两个变量都是正态连续变量,而且两者之间呈线性关系时,通常用 Pearson相关系数来衡量。"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Pearson.相关系数\n",
|
|
|
|
|
"**协方差:**\n",
|
|
|
|
|
"协方差是一个反映两个随机变量相关程度的指标,如果一个变量跟随着另一个变量同时变大或者变小,那么这两个变量的协方差就是正值\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"cov(X, Y) = \\frac{\\sum_n^i=1(X_i-\\overline{X})(Y_i-\\overline{Y})}{n-1}\n",
|
|
|
|
|
"$$"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"虽然协方差能反映两个随机变量的相关程度(协方差大于0的时候表示两者正相关,小于0的时候表示两者负相关),但是协方差值的大小并不能很好地度量两个随机变量的关联程度\n",
|
|
|
|
|
"<br><br>在二维空间中分布着一些数据,我们想知道数据点坐标X轴和Y轴的相关程度,如果X与Y的相关程度较小但是数据分布的比较离散,这样会导致求出的协方差值较大,用这个值来度量相关程度是不合理的\n",
|
|
|
|
|
"<img src=\"assets/20201120215604.png\" width=\"50%\">\n",
|
|
|
|
|
"为了更好的度量两个随机变量的相关程度, 引入Pearson相关系数,其在协方差的基础上除了两个随机变量的标准差"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"**Pearson相关系数**\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"PX,Y = \\frac{cov(X,Y)}{σXσY} = \\frac{E[(X-μX)(Y-μY)]}{σXσY} \n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"pearson是一个介于-1和1之间的值,当两个变量的线性关系增强时,相关系数趋于1或-1;当一个变量增大,另一个变量也增大时,表明它们之间是正相关的,相关系数大于0;如果一个变量增大,另一个变量却减小,表明它们之间是负相关的,相关系数小于0;如果相关系数等于0,表明它们之间不存在线性相关关系\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"<img src=\"assets/20201120220149.png\" width=\"50%\">\n",
|
|
|
|
|
"np.corrcoef(a)可结算行与行之间的相关系数,np.corrcoe"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"outputs": [],
|
|
|
|
|
"source": []
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|