You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

879 lines
45 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 相关分析\n",
"**相关分析**\n",
"<ul>\n",
" <li>衡量事物之间或称变量之间线性相关程度的强弱,并用适当的统计指标表示出来的过程。\n",
" <li>比如,家庭收入和支出、一个人所受教育程度与其收入、子女身高和父母身高等"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**相关系数**\n",
"<ul>\n",
" <li>衡量变量之间相关程度的一个量值。\n",
" <li>相关系数r的数值范围是在-1到+1之间。\n",
" <li>相关系数r的正负号表示变化方向。\"+\"号表示变化方向一致,即正相关:“-”号表示变化方向相反,即负相关。\n",
" <li>r的绝对值表示变量之间的密切程度(即强度)。绝对值越接近1表示两个变量之间关系越密切越接近0表示两个变量之间关系越不密切。\n",
" <li>相关系数的值,仅仅是一个比值。它不是由相等单位度量而来(即不等距),也不是百分比,因此,不能直接作加、减、乘、除运算。\n",
" <li>相关系数只能描述两个变量之间的变化方向及密切程度,并不能揭示两者之间的内在本质联系,即存在相关的两个变量,不一定存在因果关系。\n",
"</ul>\n",
"<img src=\"assets/20201119224526.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"左上图是线性相关最低的,右上图是线性相关很明显的;\n",
"左下图是线性相关一般的,右下图有一定的相关但不是线性相关;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 皮尔逊相关系数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 连续变量的相关分析\n",
"<ul>\n",
" <li>连续变量即数据变量,它的取值之间可以比较大小,可以用加减法计算出差异的大小。\n",
" <li>如“年龄”、“收入”、“成绩\"等变量当两个变量都是正态连续变量,而且两者之间呈线性关系时,通常用 Pearson相关系数来衡量。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pearson.相关系数\n",
"**协方差:**\n",
"协方差是一个反映两个随机变量相关程度的指标,如果一个变量跟随着另一个变量同时变大或者变小,那么这两个变量的协方差就是正值\n",
"$$\n",
"cov(X, Y) = \\frac{\\sum_n^{i=1}(X_i-\\overline{X})(Y_i-\\overline{Y})}{n-1}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"虽然协方差能反映两个随机变量的相关程度(协方差大于0的时候表示两者正相关小于0的时候表示两者负相关),但是协方差值的大小并不能很好地度量两个随机变量的关联程度\n",
"<br><br>在二维空间中分布着一些数据我们想知道数据点坐标X轴和Y轴的相关程度如果X与Y的相关程度较小但是数据分布的比较离散这样会导致求出的协方差值较大用这个值来度量相关程度是不合理的\n",
"<img src=\"assets/20201120215604.png\" width=\"50%\">\n",
"为了更好的度量两个随机变量的相关程度, 引入Pearson相关系数其在协方差的基础上除了两个随机变量的标准差"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Pearson相关系数**\n",
"$$\n",
"P_{X,Y} = \\frac{cov(X,Y)}{σXσY} = \\frac{E[(X-μX)(Y-μY)]}{σXσY} \n",
"$$\n",
"pearson是一个介于-1和1之间的值当两个变量的线性关系增强时相关系数趋于1或-1当一个变量增大另一个变量也增大时表明它们之间是正相关的相关系数大于0如果一个变量增大另一个变量却减小表明它们之间是负相关的相关系数小于0如果相关系数等于0表明它们之间不存在线性相关关系\n",
"\n",
"<img src=\"assets/20201120220149.png\" width=\"50%\">\n",
"np.corrcoef(a)可结算行与行之间的相关系数np.corrcoef(a,rowvar=0)用于计算各列之间的相关系数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 计算与检验"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[10, 10, 8, 9, 7],\n",
" [ 4, 5, 4, 3, 3],\n",
" [ 3, 3, 1, 1, 1]])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"matrix_t = np.array([[10,10,8,9,7],\n",
" [4,5,4,3,3],\n",
" [3,3,1,1,1]])\n",
"matrix_t"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.64168895, 0.84016805],\n",
" [0.64168895, 1. , 0.76376262],\n",
" [0.84016805, 0.76376262, 1. ]])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.corrcoef(matrix_t) # 计算行相关系数,对角线是自己与自己比较"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.98898224, 0.9526832 , 0.9939441 , 0.97986371],\n",
" [0.98898224, 1. , 0.98718399, 0.99926008, 0.99862543],\n",
" [0.9526832 , 0.98718399, 1. , 0.98031562, 0.99419163],\n",
" [0.9939441 , 0.99926008, 0.98031562, 1. , 0.99587059],\n",
" [0.97986371, 0.99862543, 0.99419163, 0.99587059, 1. ]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.corrcoef(matrix_t, rowvar=0) # 计算列相关系数,对角线是自己与自己比较"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"计算伦敦市月平均气温与降水量\n",
"<img src=\"assets/20201120221030.png\" width=\"50%\">\n",
"计算伦敦市月平均气温t与降水量p之间的相关系数\n",
"$$\n",
"r_tp = \\frac{\\sum_{i=1}^{12}(t_i-\\overline{t})(p_i-\\overline{p})}\n",
"{\\sqrt{\\sum^{12}_{t=1}(t_i-\\overline{t})^2}\\sqrt{\\sum^{12}_{i=1}(p_i-\\overline{p})}}\n",
"=\n",
"\\frac{-300.91}{\\sqrt{250.55}\\sqrt{1508.34}}\n",
"$$\n",
"$$\n",
"= \\frac{-300.91}{15.83×38.84} = -0.4895\n",
"$$\n",
"计算结果表明伦敦市的月平均气温t与降水量p呈负相关即异向相关"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 相关系数的显著性检验\n",
"假设\n",
"<ul>\n",
" <li>H0p=0\n",
" <li>H1p≠0\n",
"</ul>\n",
"统计量\n",
"$$\n",
"t = \\frac{r\\sqrt{n-2}}{\\sqrt{1-r^2}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"10个学生初一数学分数与初二数学分数的相关系数为087问从总体上来说初一与初二数学分数是否存在相关?\n",
"<img src=\"assets/20201120222349.png\" width=\"50%\">\n",
"计算检验统计量:\n",
"$$\n",
"t = \\frac{r\\sqrt{n-2}}{\\sqrt{1-r^2}} \n",
"= \\frac{0.78\\sqrt{10-2}}{\\sqrt{1-0.78^2}}\n",
"= 3.524\n",
"$$\n",
"\n",
"$$\n",
"t = 3.524 > 3.355 = t_{(8)0.01}\n",
"$$\n",
"所以,总体来说初一和初二的成绩存在正相关。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"correlation: 0.9891763198690562\n",
"pvalue: 5.926875946481138e-08\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAATHUlEQVR4nO3df4zcd33n8efrbF9ZKO1eyZbG64BbFbmnJk2crNJwkSJIuDpAlLgBdKlKSxA931XpEarK6FydOIFO4pBPLdwhEaXkrqZQfjQYnxsBJi2kV/5I0Dp2MMGxmuulxOtcvSQ4kGObs837/thxzlnP7MzYs56Zr58PabXf+X4/nnnFSV47+/l+vt9JVSFJGn//aNgBJEmDYaFLUkNY6JLUEBa6JDWEhS5JDbF6WC980UUX1fr164f18pI0lvbu3fvdqppqd2xohb5+/XpmZ2eH9fKSNJaS/F2nY065SFJDWOiS1BAWuiQ1hIUuSQ1hoUtSQ1joktQQQ1u2KEkXil375ti+5xBHji2wdnKCrZs2sHnj9MBfx3fokrSCdu2bY9vOA8wdW6CAuWML/O5n9/Pvdh0Y+GtZ6JK0grbvOcTC8ZMv2lfApx78Drv2zQ30tSx0SVpBR44ttN1fLJb9IFnokrSC1k5OdDzWqezPloUuSSto66YNpMOx5cr+bFjokrSCNm+c5tevedUZpT6xZhVbN20Y6Gv1tGwxyRPAD4CTwImqmllyPMBHgDcBPwRur6qHB5pU0gXlfC31Ox/+w+bLmHn1T634P08/69BfX1Xf7XDsjcBrWl+/DHys9V2S+nZqqd+p1SFzxxbYtnNxmd+4lvrmjdMrnn1QUy63AJ+oRQ8Ck0kuHtBzS7rAtFvqt3D85MBXhTRNr4VewFeS7E2ypc3xaeDJ0x4fbu17kSRbkswmmZ2fn+8/raQLQqfVH4NeFdI0vRb6tVV1JYtTK3ckuW7J8XYnceuMHVV3V9VMVc1MTbX9BCVJ6rj6Y9CrQpqmp0KvqiOt70eBLwBXLxlyGLjktMfrgCODCCjpwrN10wYm1qx60b6VWBXSNF0LPcnLkrz81DbwK8C3lgzbDfxmFl0DPFtVTw08raQLwuaN03zw1suYnpwgwPTkBB+89bKxPSF6vvSyyuWVwBcWVyayGvjTqvpykn8NUFV3AV9kccni4ywuW3znysSVNMoGudTwfKwKaZquhV5Vfwtc3mb/XadtF3DHYKNJGidNXGo4brxSVNJAuNRw+Cx0SQPhUsPhs9AlDYRLDYfPQpc0EC41HD4/U1TSQJw68dmUG2qNIwtd0sC41HC4nHKRpIaw0CWpISx0SWoIC12SGsJCl6SGsNAlqSEsdElqCAtdkhrCQpekhrDQJakhLHRJaggLXZIawkKXpIboudCTrEqyL8l9bY7dnmQ+yf7W128NNqYkqZt+bp97J3AQ+IkOxz9bVb9z7pEkSWejp3foSdYBbwY+vrJxJElnq9cplw8D7wV+tMyYtyT5ZpJ7k1zSbkCSLUlmk8zOz8/3m1WStIyuhZ7kJuBoVe1dZtifA+ur6peAvwB2tBtUVXdX1UxVzUxNTZ1VYElSe728Q78WuDnJE8BngOuTfPL0AVX1dFU933r4R8BVA00pSeqqa6FX1baqWldV64HbgK9W1dtPH5Pk4tMe3sziyVNJ0nl01h8SneQDwGxV7QbeneRm4ATwDHD7YOJJknqVqhrKC8/MzNTs7OxQXluSxlWSvVU10+6YV4pKUkOc9ZSLpAvLrn1zbN9ziCPHFlg7OcHWTRvYvHF62LF0GgtdUle79s2xbecBFo6fBGDu2ALbdh4AsNRHiFMukrravufQC2V+ysLxk2zfc2hIidSOhS6pqyPHFvrar+Gw0CV1tXZyoq/9Gg4LXVJXWzdtYGLNqhftm1iziq2bNgwpkdrxpKikrk6d+HSVy2iz0CX1ZPPGaQt8xDnlIkkNYaFLUkNY6JLUEBa6JDWEJ0WlEec9VNQrC10aYd5DRf1wykUaYd5DRf2w0KUR5j1U1A8LXRph3kNF/bDQpRHmPVTUD0+KSiPMe6ioHz0XepJVwCwwV1U3LTn2Y8AngKuAp4F/UVVPDDCndMHyHirqVT9TLncCBzscexfwvar6eeAPgQ+dazBJUn96KvQk64A3Ax/vMOQWYEdr+17ghiQ593iSpF71+g79w8B7gR91OD4NPAlQVSeAZ4FXLB2UZEuS2SSz8/PzZxFXktRJ10JPchNwtKr2Ljeszb46Y0fV3VU1U1UzU1NTfcSUJHXTyzv0a4GbkzwBfAa4Psknl4w5DFwCkGQ18JPAMwPMKUnqomuhV9W2qlpXVeuB24CvVtXblwzbDbyjtf3W1pgz3qFLklbOWa9DT/IBYLaqdgP3AH+S5HEW35nfNqB8kqQe9VXoVfUA8EBr+32n7f8H4G2DDCZJ6o+X/ktSQ1joktQQFrokNYSFLkkNYaFLUkNY6JLUEBa6JDWEhS5JDWGhS1JDWOiS1BAWuiQ1hIUuSQ1hoUtSQ1joktQQFrokNYSFLkkNYaFLUkNY6JLUEBa6JDVE10JP8pIk30jySJJHk7y/zZjbk8wn2d/6+q2ViStJ6qSXD4l+Hri+qp5Lsgb4epIvVdWDS8Z9tqp+Z/ARJUm96FroVVXAc62Ha1pftZKhJEn962kOPcmqJPuBo8D9VfVQm2FvSfLNJPcmuaTD82xJMptkdn5+/hxiS5KW6qnQq+pkVV0BrAOuTnLpkiF/Dqyvql8C/gLY0eF57q6qmaqamZqaOpfckqQl+lrlUlXHgAeAG5fsf7qqnm89/CPgqoGkkyT1rJdVLlNJJlvbE8AbgMeWjLn4tIc3AwcHGVKS1F0vq1wuBnYkWcXiD4DPVdV9ST4AzFbVbuDdSW4GTgDPALevVGBJUntZXMRy/s3MzNTs7OxQXluSxlWSvVU10+6YV4pKUkNY6JLUEBa6JDWEhS5JDWGhS1JDWOiS1BAWuiQ1hIUuSQ1hoUtSQ1joktQQFrokNUQvN+eSRsqufXNs33OII8cWWDs5wdZNG9i8cXrYsaShs9A1Vnbtm2PbzgMsHD8JwNyxBbbtPABgqeuC55SLxsr2PYdeKPNTFo6fZPueQ0NKJI0OC11j5cixhb72SxcSC11jZe3kRF/7pQuJha6xsnXTBibWrHrRvok1q9i6acOQEkmjw5OiGiunTny6ykU6k4WusbN547QFLrXRdcolyUuSfCPJI0keTfL+NmN+LMlnkzye5KEk61cirCSps17m0J8Hrq+qy4ErgBuTXLNkzLuA71XVzwN/CHxosDElSd10LfRa9Fzr4ZrWVy0Zdguwo7V9L3BDkgwspSSpq55WuSRZlWQ/cBS4v6oeWjJkGngSoKpOAM8CrxhkUEnS8noq9Ko6WVVXAOuAq5NcumRIu3fjS9/Fk2RLktkks/Pz8/2nlSR11Nc69Ko6BjwA3Ljk0GHgEoAkq4GfBJ5p8+fvrqqZqpqZmpo6q8CSpPZ6WeUylWSytT0BvAF4bMmw3cA7WttvBb5aVWe8Q5ckrZxe1qFfDOxIsorFHwCfq6r7knwAmK2q3cA9wJ8keZzFd+a3rVhiSVJbXQu9qr4JbGyz/32nbf8D8LbBRpMk9cN7uUhSQ1joktQQFrokNYSFLkkNYaFLUkNY6JLUEBa6JDWEhS5JDWGhS1JDWOiS1BAWuiQ1hIUuSQ1hoUtSQ1joktQQFrokNYSFLkkNYaFLUkNY6JLUEBa6JDWEhS5JDdG10JNckuRrSQ4meTTJnW3GvC7Js0n2t77e1+65JEkrZ3UPY04Av1dVDyd5ObA3yf1V9e0l4/66qm4afERJUi+6vkOvqqeq6uHW9g+Ag8D0SgeTJPWnrzn0JOuBjcBDbQ6/NskjSb6U5Bc7/PktSWaTzM7Pz/cdVpLUWc+FnuTHgc8D76mq7y85/DDw6qq6HPgvwK52z1FVd1fVTFXNTE1NnW1mSVIbPRV6kjUslvmnqmrn0uNV9f2qeq61/UVgTZKLBppUkrSsXla5BLgHOFhVf9BhzM+0xpHk6tbzPj3IoJKk5fWyyuVa4DeAA0n2t/b9PvAqgKq6C3gr8NtJTgALwG1VVSuQV5LUQddCr6qvA+ky5qPARwcVSpLUP68UlaSGsNAlqSEsdElqCAtdkhrCQpekhrDQJakhLHRJaggLXZIawkKXpIaw0CWpISx0SWoIC12SGsJCl6SG6OX2uRqgXfvm2L7nEEeOLbB2coKtmzaweaMf0Srp3Fno59GufXNs23mAheMnAZg7tsC2nQcALHVJ58wpl/No+55DL5T5KQvHT7J9z6EhJZLUJBb6eXTk2EJf+yWpHxb6ebR2cqKv/ZLUDwv9PNq6aQMTa1a9aN/EmlVs3bRhSIkkNUnXQk9ySZKvJTmY5NEkd7YZkyT/OcnjSb6Z5MqViTveNm+c5oO3Xsb05AQBpicn+OCtl3lCVNJA9LLK5QTwe1X1cJKXA3uT3F9V3z5tzBuB17S+fhn4WOu7lti8cdoCl7Qiur5Dr6qnqurh1vYPgIPA0ka6BfhELXoQmExy8cDTSpI66msOPcl6YCPw0JJD08CTpz0+zJmlT5ItSWaTzM7Pz/eXVJK0rJ4LPcmPA58H3lNV3196uM0fqTN2VN1dVTNVNTM1NdVfUknSsnoq9CRrWCzzT1XVzjZDDgOXnPZ4HXDk3ONJknrVyyqXAPcAB6vqDzoM2w38Zmu1yzXAs1X11ABzSpK66GWVy7XAbwAHkuxv7ft94FUAVXUX8EXgTcDjwA+Bdw4+qiRpOV0Lvaq+Tvs58tPHFHDHoEJJkvrnlaKS1BAWuiQ1hIUuSQ1hoUtSQ1joktQQY/kRdH4upySdaewK3c/llKT2xm7Kxc/llKT2xq7Q/VxOSWpv7Ardz+WUpPbGrtBf/wtTZ9yHwM/llKQxK/Rd++b4/N65F91oPcBbrvJj3SRprAq93QnRAr72mJ9+JEljVeieEJWkzsaq0D0hKkmdjVWhb920gYk1q160zxOikrRorK4UPXXi08v+JelMY1XosFjqFrgknWmsplwkSZ1Z6JLUEF0LPcl/TXI0ybc6HH9dkmeT7G99vW/wMSVJ3fQyh/7HwEeBTywz5q+r6qaBJJIknZWu79Cr6n8Az5yHLJKkczCoOfTXJnkkyZeS/GKnQUm2JJlNMjs/7+X6kjRIqarug5L1wH1VdWmbYz8B/KiqnkvyJuAjVfWaHp5zHvi7ZYZcBHy3a7jRMC5ZzTl445LVnIM3rKyvrqqpdgfOudDbjH0CmKmqc/oHTTJbVTPn8hzny7hkNefgjUtWcw7eKGY95ymXJD+TJK3tq1vP+fS5Pq8kqT9dV7kk+TTwOuCiJIeBfw+sAaiqu4C3Ar+d5ASwANxWvbztlyQNVNdCr6pf63L8oywuaxy0u1fgOVfKuGQ15+CNS1ZzDt7IZe1pDl2SNPq89F+SGsJCl6SGGLlCT/KSJN9oXaj0aJL3DzvTcpKsSrIvyX3DztJJkieSHGjda2d22HmWk2Qyyb1JHktyMMlrh51pqSQbTrt30f4k30/ynmHnaifJ77b+P/pWkk8necmwM3WS5M5WzkdH6e+z3f2skvxUkvuT/E3r+z8ZZsZTRq7QgeeB66vqcuAK4MYk1ww503LuBA4OO0QPXl9VV4zautk2PgJ8uap+AbicEfy7rapDrb/LK4CrgB8CXxhyrDMkmQbezeJ1IZcCq4DbhpuqvSSXAv8SuJrFf+83Jel6geJ58sfAjUv2/VvgL1sXUf5l6/HQjVyh16LnWg/XtL5G8sxtknXAm4GPDztLE7SuOr4OuAegqv5vVR0bbqqubgD+Z1Utd9XzMK0GJpKsBl4KHBlynk7+KfBgVf2wqk4AfwX86pAzAR3vZ3ULsKO1vQPYfF5DdTByhQ4vTGPsB44C91fVQ8PO1MGHgfcCPxp2kC4K+EqSvUm2DDvMMn4OmAf+W2sa6+NJXjbsUF3cBnx62CHaqao54D8B3wGeAp6tqq8MN1VH3wKuS/KKJC8F3gRcMuRMy3llVT0F0Pr+00POA4xooVfVydavs+uAq1u/jo2UJDcBR6tq77Cz9ODaqroSeCNwR5Lrhh2og9XAlcDHqmoj8H8YkV9l20nyj4GbgT8bdpZ2WvO6twA/C6wFXpbk7cNN1V5VHQQ+BNwPfBl4BDgx1FBjaCQL/ZTWr9sPcOb81Si4Fri5de+azwDXJ/nkcCO1V1VHWt+PsjjXe/VwE3V0GDh82m9k97JY8KPqjcDDVfX3ww7SwRuA/1VV81V1HNgJ/LMhZ+qoqu6pqiur6joWpzj+ZtiZlvH3SS4GaH0/OuQ8wAgWepKpJJOt7QkW/6N8bLipzlRV26pqXVWtZ/HX7q9W1ci9+0nysiQvP7UN/AqLv96OnKr638CTSTa0dt0AfHuIkbr5NUZ0uqXlO8A1SV7aut/SDYzgSeZTkvx06/urgFsZ7b/b3cA7WtvvAP77ELO8oJdPLDrfLgZ2JFnF4g+cz1XVyC4JHAOvBL7Qun/aauBPq+rLw420rH8DfKo1nfG3wDuHnKet1jzvPwf+1bCzdFJVDyW5F3iYxemLfYzg5eqn+XySVwDHgTuq6nvDDgQd72f1H4HPJXkXiz843za8hP+fl/5LUkOM3JSLJOnsWOiS1BAWuiQ1hIUuSQ1hoUtSQ1joktQQFrokNcT/A25n/syy3KbXAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from scipy import stats\n",
"import matplotlib.pyplot as plt\n",
"x = [10.35, 6.24,3.18,8.46,3.21,7.65,4.32,8.66,9.12,10.31]\n",
"y = [5.1, 3.15,1.67,4.33,1.76,4.11,2.11,4.88,4.99,5.12]\n",
"correlation, pvalue = stats.stats.pearsonr(x,y)\n",
"print('correlation:', correlation) # 相关系数高\n",
"print('pvalue:', pvalue)\n",
"plt.scatter(x,y)\n",
"plt.show() # 类斜线"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 斯皮尔曼等级变量的相关分析\n",
"当测量得到的数据不是等距或等比数据,而是具有等级顺序的数据;或者得到的数据是等距或等比数据,但其所来自的总体分布不是正态的,不满足求皮尔森相关系数(积差相关)的要求。这时就要运用等级相关系数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"小实验两个基因A、B他们的表达量关系是B=2A在8个样本中的表达了如下\n",
"<img src=\"assets/20201120223504.png\" width=\"70%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"计算得出,他们的皮尔森相关系数=1P-vlaue=0从以上可以直观看出如果两个基因的表达量呈线性关系则具有显著的皮尔森相关性\n",
"<br><br>\n",
"以上是两个基因呈线性关系的结果。如果两者呈非线性关系,例如幂函数关系(曲线关系),那又如何呢?\n",
"<br><br>两个基因A、D他们的关系是D=A^10在8个样本中的表达量值如下\n",
"<img src=\"assets/20201120223836.png\" width=\"50%\">\n",
"<img src=\"assets/20201120223916.png\" width=\"70%\">"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"correlation: 0.7659287963138055\n",
"pvalue: 0.026696497208768055\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEDCAYAAADZUdTgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAUP0lEQVR4nO3df5BdZ33f8fcHWWBhftiJttSWbORkXBWX2hbZEaXugAlgyYTaTptOpPyok8KoaTFJmhln7KYTt2Y6ZapJmrZDAMUohjTIDcZ21Q4gOzXUtMGNV8ixsUEghBOvlEYbFEEIGmyJb//YI7iSV9q7u1e6d/W8XzN39p7nPOfc7zMef/boOc/dk6pCktSGFwy7AEnSmWPoS1JDDH1JaoihL0kNMfQlqSGGviQ1ZGRDP8nWJAeSfL6Pvpck+VSSXUkeT/LWM1GjJC02Ixv6wF3A+j77/ivg96pqDbAB+M3TVZQkLWYjG/pV9TBwsLctyQ8m+WSSnUk+k+RvHusOvKx7/3Jg/xksVZIWjXOGXcAcbQF+rqq+nOS1TF/R/zDwr4EHkrwLOA948/BKlKTRtWhCP8lLgL8LfDTJseYXdT83AndV1a8leR3wO0leXVXfGUKpkjSyFk3oMz0Vdaiqrpph39vp5v+r6rNJzgWWAwfOYH2SNPJGdk7/RFX1DeCrSf4RQKZd2e3+E+BNXfurgHOBqaEUKkkjLKP6VzaTbAOuYfqK/c+A24GHgPcBFwJLgbur6o4klwO/BbyE6Zu6v1xVDwyjbkkaZSMb+pKkwVs00zuSpIUbyRu5y5cvr1WrVg27DElaNHbu3PnnVTU2W7+RDP1Vq1YxMTEx7DIkadFI8sf99HN6R5IaYuhLUkMMfUlqiKEvSQ0x9CWpISO5ekeSWnL/rn1s3rGb/YcOc9H5y7hl3WpuXLPitHyWoS9JQ3T/rn3cdu8THH7uKAD7Dh3mtnufADgtwe/0jiQN0eYdu78b+Mccfu4om3fsPi2fZ+hL0hDtP3R4Tu0LZehL0hBddP6yObUvlKEvSUN0y7rVLFu65Li2ZUuXcMu61afl87yRK0lDdOxmrat3JKkRN65ZcdpC/kSzhn6SrcDbgANV9eoZ9t8C/GTP+V4FjFXVwSRPA38JHAWOVNX4oAqXJM1dP3P6d9E9dHwmVbW5qq7qHlh+G/C/qupgT5c3dvsNfEkasllDv6oeBg7O1q+zEdi2oIokSafNwFbvJHkx0/8i+FhPcwEPJNmZZNMsx29KMpFkYmpqalBlSZJ6DHLJ5t8H/s8JUztXV9VrgOuAdyZ5/ckOrqotVTVeVeNjY7M+8UuSNA+DDP0NnDC1U1X7u58HgPuAtQP8PEnSHA0k9JO8HHgD8N962s5L8tJj74Frgc8P4vMkSfPTz5LNbcA1wPIkk8DtwFKAqnp/1+1HgQeq6q96Dn0FcF+SY5/zkar65OBKlyTN1ayhX1Ub++hzF9NLO3vb9gJXzrcwSdLg+bd3JKkhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1ZNbQT7I1yYEkMz7UPMk1Sb6e5LHu9as9+9Yn2Z1kT5JbB1m4JGnu+rnSvwtYP0ufz1TVVd3rDoAkS4D3AtcBlwMbk1y+kGIlSQsza+hX1cPAwXmcey2wp6r2VtWzwN3ADfM4jyRpQAY1p/+6JH+U5BNJ/lbXtgJ4pqfPZNc2oySbkkwkmZiamhpQWZKkXoMI/c8Br6yqK4H/DNzftWeGvnWyk1TVlqoar6rxsbGxAZQlSTrRgkO/qr5RVd/s3n8cWJpkOdNX9hf3dF0J7F/o50mS5m/BoZ/krydJ935td86vAY8ClyW5NMkLgQ3A9oV+niRp/s6ZrUOSbcA1wPIkk8DtwFKAqno/8GPAP0tyBDgMbKiqAo4kuRnYASwBtlbVk6dlFJKkvmQ6n0fL+Ph4TUxMDLsMSVo0kuysqvHZ+vmNXElqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDZk19JNsTXIgyedPsv8nkzzevf4gyZU9+55O8kSSx5L4/ENJGrJ+rvTvAtafYv9XgTdU1RXAu4EtJ+x/Y1Vd1c+zGyVJp9c5s3WoqoeTrDrF/j/o2XwEWLnwsiRJp8Og5/TfDnyiZ7uAB5LsTLLpVAcm2ZRkIsnE1NTUgMuSJEEfV/r9SvJGpkP/7/U0X11V+5P8NeDBJF+sqodnOr6qttBNDY2Pj9eg6pIkfc9ArvSTXAHcCdxQVV871l5V+7ufB4D7gLWD+DxJ0vwsOPSTXALcC/x0VX2pp/28JC899h64FphxBZAk6cyYdXonyTbgGmB5kkngdmApQFW9H/hV4PuB30wCcKRbqfMK4L6u7RzgI1X1ydMwBklSn/pZvbNxlv3vAN4xQ/te4MrnHyFJGha/kStJDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSF9hX6SrUkOJJnxweaZ9p+S7EnyeJLX9Oy7KcmXu9dNgypckjR3/V7p3wWsP8X+64DLutcm4H0ASb6P6QepvxZYC9ye5IL5FitJWpi+Qr+qHgYOnqLLDcCHa9ojwPlJLgTWAQ9W1cGq+gvgQU79y0OSdBoNak5/BfBMz/Zk13ay9udJsinJRJKJqampAZUlSeo1qNDPDG11ivbnN1ZtqarxqhofGxsbUFmSpF6DCv1J4OKe7ZXA/lO0S5KGYFChvx34x90qnr8DfL2q/hTYAVyb5ILuBu61XZskaQjO6adTkm3ANcDyJJNMr8hZClBV7wc+DrwV2AN8C/jZbt/BJO8GHu1OdUdVneqGsCTpNOor9Ktq4yz7C3jnSfZtBbbOvTRJ0qD5jVxJaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ3pK/STrE+yO8meJLfOsP8/JHmse30pyaGefUd79m0fZPGSpLmZ9Rm5SZYA7wXeAkwCjybZXlVPHetTVf+ip/+7gDU9pzhcVVcNrmRJ0nz1c6W/FthTVXur6lngbuCGU/TfCGwbRHGSpMHqJ/RXAM/0bE92bc+T5JXApcBDPc3nJplI8kiSG+ddqSRpwWad3gEyQ1udpO8G4J6qOtrTdklV7U/yA8BDSZ6oqq8870OSTcAmgEsuuaSPsiRJc9XPlf4kcHHP9kpg/0n6buCEqZ2q2t/93At8muPn+3v7bamq8aoaHxsb66MsSdJc9RP6jwKXJbk0yQuZDvbnrcJJshq4APhsT9sFSV7UvV8OXA08deKxkqQzY9bpnao6kuRmYAewBNhaVU8muQOYqKpjvwA2AndXVe/Uz6uADyT5DtO/YN7Tu+pHknRm5fiMHg3j4+M1MTEx7DIkadFIsrOqxmfr5zdyJakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIb0FfpJ1ifZnWRPkltn2P8zSaaSPNa93tGz76YkX+5eNw2yeEnS3JwzW4ckS4D3Am8BJoFHk2yvqqdO6Ppfq+rmE479PuB2YBwoYGd37F8MpHpJ0pz0c6W/FthTVXur6lngbuCGPs+/Dniwqg52Qf8gsH5+pUqSFqqf0F8BPNOzPdm1negfJnk8yT1JLp7jsSTZlGQiycTU1FQfZUmS5qqf0M8MbXXC9n8HVlXVFcDvAx+aw7HTjVVbqmq8qsbHxsb6KEuSNFf9hP4kcHHP9kpgf2+HqvpaVX272/wt4If6PVaSdOb0E/qPApcluTTJC4ENwPbeDkku7Nm8HvhC934HcG2SC5JcAFzbtUmShmDW1TtVdSTJzUyH9RJga1U9meQOYKKqtgM/n+R64AhwEPiZ7tiDSd7N9C8OgDuq6uBpGIckqQ+pmnGKfajGx8drYmJi2GVI0qKRZGdVjc/Wz2/kSlJDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkNmfVyiJJ1p9+/ax+Ydu9l/6DAXnb+MW9at5sY1K4Zd1lmhryv9JOuT7E6yJ8mtM+z/pSRPJXk8yf9M8sqefUeTPNa9tp94rCT1un/XPm679wn2HTpMAfsOHea2e5/g/l37hl3aWWHW0E+yBHgvcB1wObAxyeUndNsFjFfVFcA9wL/v2Xe4qq7qXtcPqG5JZ6nNO3Zz+Lmjx7Udfu4om3fsHlJFZ5d+rvTXAnuqam9VPQvcDdzQ26GqPlVV3+o2HwFWDrZMSa3Yf+jwnNo1N/2E/grgmZ7tya7tZN4OfKJn+9wkE0keSXLjyQ5KsqnrNzE1NdVHWZLORhedv2xO7ZqbfkI/M7TVjB2TnwLGgc09zZdU1TjwE8BvJPnBmY6tqi1VNV5V42NjY32UJelsdMu61SxbuuS4tmVLl3DLutVDqujs0s/qnUng4p7tlcD+EzsleTPwK8Abqurbx9qran/3c2+STwNrgK8soGZJZ7Fjq3RcvXN69BP6jwKXJbkU2AdsYPqq/buSrAE+AKyvqgM97RcA36qqbydZDlzN8Td5Jel5blyzwpA/TWYN/ao6kuRmYAewBNhaVU8muQOYqKrtTE/nvAT4aBKAP+lW6rwK+ECS7zA9lfSeqnrqNI1FkjSLVM04PT9U4+PjNTExMewyJGnRSLKzu396Sv4ZBklqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDennwegkWQ/8R6afkXtnVb3nhP0vAj4M/BDwNeDHq+rpbt9twNuBo8DPV9WOgVXf4/5d+9i8Yzf7Dx3movOXccu61QDPa+v3Ycszne9seVDzqI9tIfWN+tikYZv1GblJlgBfAt4CTAKPAht7H3Ce5J8DV1TVzyXZAPxoVf14ksuBbcBa4CLg94G/UVVHT/WZc31G7v279nHbvU9w+LnvnXbpCwKB545+b3zLli7h3/2Dvz1rCMx0vn6PHXWjPraF1DfqY5NOp0E+I3ctsKeq9lbVs8DdwA0n9LkB+FD3/h7gTUnStd9dVd+uqq8Ce7rzDdTmHbuP+x8d4Lnv1HGBD3D4uaNs3rF7Xufr99hRN+pjW0h9oz42aRT0E/orgGd6tie7thn7VNUR4OvA9/d5LABJNiWZSDIxNTXVX/Wd/YcOD7TvyfrM5XNG1aiPbSH1jfrYpFHQT+hnhrYT54RO1qefY6cbq7ZU1XhVjY+NjfVR1vdcdP6ygfY9WZ+5fM6oGvWxLaS+UR+bNAr6Cf1J4OKe7ZXA/pP1SXIO8HLgYJ/HLtgt61azbOmS49qWviAsXXL875xlS5d89wbvXM/X77GjbtTHtpD6Rn1s0ijoZ/XOo8BlSS4F9gEbgJ84oc924Cbgs8CPAQ9VVSXZDnwkya8zfSP3MuAPB1X8Mcdu0g1q9c7Jznc23Awc9bEtpL5RH5s0CmZdvQOQ5K3AbzC9ZHNrVf3bJHcAE1W1Pcm5wO8Aa5i+wt9QVXu7Y38F+CfAEeAXq+oTs33eXFfvSFLr+l2901fon2mGviTNzSCXbEqSzhKGviQ1xNCXpIYY+pLUkJG8kZtkCvjjYdfRWQ78+bCLGADHMTrOhjGA4xg1q6vqpbN16uuvbJ5pVTW3r+SeRkkm+rkjPuocx+g4G8YAjmPUJOlryaPTO5LUEENfkhpi6M9uy7ALGBDHMTrOhjGA4xg1fY1jJG/kSpJOD6/0Jakhhr4kNcTQP4kkW5McSPL5YdcyX0kuTvKpJF9I8mSSXxh2TfOR5Nwkf5jkj7px/Jth17QQSZYk2ZXkfwy7lvlK8nSSJ5I81u9SwVGT5Pwk9yT5Yvf/yOuGXdNcJVnd/Tc49vpGkl885THO6c8syeuBbwIfrqpXD7ue+UhyIXBhVX0uyUuBncCNvQ+1Xwy65y2fV1XfTLIU+N/AL1TVI0MubV6S/BIwDrysqt427HrmI8nTwHhVLdovNSX5EPCZqrozyQuBF1fVoWHXNV9JljD9zJPXVtVJv9zqlf5JVNXDTD8bYNGqqj+tqs917/8S+AIneUbxKKtp3+w2l3avRXm1kmQl8CPAncOupWVJXga8HvggQFU9u5gDv/Mm4CunCnww9JuRZBXTD7n5v8OtZH66KZHHgAPAg1W1KMfB9MOIfhn4zrALWaACHkiyM8mmYRczDz8ATAG/3U213ZnkvGEXtUAbgG2zdTL0G5DkJcDHmH5y2TeGXc98VNXRqrqK6ecsr02y6KbckrwNOFBVO4ddywBcXVWvAa4D3tlNhy4m5wCvAd5XVWuAvwJuHW5J89dNT10PfHS2vob+Wa6bA/8Y8LtVde+w61mo7p/gnwbWD7mU+bgauL6bD78b+OEk/2W4Jc1PVe3vfh4A7gPWDreiOZsEJnv+xXgP078EFqvrgM9V1Z/N1tHQP4t1N0A/CHyhqn592PXMV5KxJOd375cBbwa+ONyq5q6qbquqlVW1iul/ij9UVT815LLmLMl53cIAuimRa4FFtcqtqv4f8EyS1V3Tm4BFtcDhBBvpY2oHRvSvbI6CJNuAa4DlSSaB26vqg8Otas6uBn4aeKKbDwf4l1X18SHWNB8XAh/qVie8APi9qlq0yx3PAq8A7pu+puAc4CNV9cnhljQv7wJ+t5sa2Qv87JDrmZckLwbeAvzTvvq7ZFOS2uH0jiQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDfn/hHmkFDE/AicAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"x = [0.6,0.7,1,2.1,2.9,3.2,5.5,6.7]\n",
"y = np.power(x, 10)\n",
"correlation, pvalue = stats.stats.pearsonr(x, y)\n",
"print('correlation:', correlation) # 相关系数高\n",
"print('pvalue:', pvalue)\n",
"plt.scatter(x,y)\n",
"plt.show() # 类斜线"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"可以看到,基因A、D相关系数无论数值还是显著性都下降了。皮尔森相关系数是一种线性相关系数因此如果两个变量呈线性关系的时候具有最大的显著性。对于非线性关系(例如A、D的幂函数关系),则其对相关性的检测功效会下降\n",
"\n",
"<br>这时我们可以考虑另外一个相关系数计算方法:斯皮尔曼等级相关。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**斯皮尔曼等级相关**\n",
"当两个变量值以等级次序排列或以等级次序表示时,两个相应总体并不一定呈正态分布,样本容量也不一定大于30,表示这两变量之间的相关,称为 Spearman等级相关。\n",
"\n",
"<br>简单点说,就是无论两个变量的数据如何变化,符合什么样的分布,我们只关心每个数值在变量内的排列顺序。如果两个变量的对应值,在各组内的排序顺位是相同或类似的,则具有显著的相关性。\n",
"$$\n",
"p = 1-\\frac{6\\sum{d^2_i}}{n^2-n}\n",
"$$\n",
"<ul>\n",
" <li>n 为等级个数\n",
" <li>d 为二列成对变量的等级差数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"将上表转换成排序等级\n",
"<img src=\"assets/20201120224505.png\" width=\"50%\">\n",
"\n",
"利用斯皮尔曼等级相关计算A、D基因表达量的相关性结果是:r=1, p-value=4.96e-05\n",
"\n",
"这里斯皮尔曼等级相关的显著性显然高于皮尔森相关。这是因为虽然两个基因的表达量是非线性关系,但两个基因表达量在所有样本中的排列顺序是完全相同的,因为具有极显著的斯皮尔曼等级相关性。"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"correlation: 0.9999999999999999\n",
"pvalue: 6.646897422032013e-64\n"
]
}
],
"source": [
"x = [10.35,6.24,3.18,8.46,3.21,7.65,4.32,8.66,9.12,10.31]\n",
"y = [5.13,3.15,1.67,4.33,1.76,4.11,2.11,4.88,4.99,5.12]\n",
"correlation, pvalue = stats.stats.spearmanr(x, y)\n",
"print('correlation:', correlation) # 相关系数高\n",
"print('pvalue:', pvalue)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[10. 4. 1. 6. 2. 5. 3. 7. 8. 9.] [10. 4. 1. 6. 2. 5. 3. 7. 8. 9.]\n",
"correlation: 0.9999999999999999\n",
"pvalue: 6.646897422032013e-64\n"
]
}
],
"source": [
"import scipy\n",
"x = [10.35,6.24,3.18,8.46,3.21,7.65,4.32,8.66,9.12,10.31]\n",
"y = [5.13,3.15,1.67,4.33,1.76,4.11,2.11,4.88,4.99,5.12]\n",
"x = scipy.stats.stats.rankdata(x) # 原本的数据转换成等级数据\n",
"y = scipy.stats.stats.rankdata(y)\n",
"print(x, y) # 转换成等级后,可以看到等级是极其相似的。\n",
"# x内3.18是最小的所以是1y内1.67是最小的所以是1其它值同样\n",
"\n",
"correlation, pvalue = stats.stats.spearmanr(x, y)\n",
"print('correlation:', correlation) # 相关系数高\n",
"print('pvalue:', pvalue)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**斯皮尔曼实例**\n",
"10名高三学生学习潜在能力测验与自学能力测验成绩如下表所示问两者相关情况如何\n",
"<img src=\"assets/20201120225540.png\" width=\"50%\">\n",
"计算等级相关系数\n",
"$$\n",
"r_R = 1-\\frac{2\\sum D^2}{n(n^2-1)}\n",
"= 1-\\frac{6×18}{10(10^2-1)}\n",
"=0.891\n",
"$$\n",
"**等级相关系数的显著性检验**\n",
"与积差相关系数检验的方法相同\n",
"10个学生学习潜在能力与自学能力测验成绩相关系数为0.891,问总体上说,两者是否存在相关?\n",
"\n",
"计算检验统计量的值:\n",
"$$\n",
"t = \\frac{r\\sqrt{n-2}}{\\sqrt{1-r^2}} \n",
"= \\frac{0.891\\sqrt{10-2}}{\\sqrt{1-0.891^2}}\n",
"= 5.551\n",
"$$\n",
"\n",
"$$\n",
"5.551 > t_{(8)0.01} = 3.355\n",
"$$\n",
"所以学生的学习潜在学习能力与自学能力有较强的正相关。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 肯德尔和谐系数( Kendall)\n",
"当多个(两个以上)变量值以等级次序排列或以等级次序表示,描述这几个变量之间的一性程度的量,称为肯徳尔和谐系数。\n",
"<br>它常用来表示几个评定者对同组学生成绩用等级先后评定多次之间的一致性程度。\n",
"<br>\n",
"**同一评价者无相同等级评定时,计算公式:**\n",
"$$\n",
"W = \\frac{S}{\\frac{1}{12}K^2(N^3-N)}\n",
"$$\n",
"<ul>\n",
" <li>N 被评的对象数;\n",
" <li>K 评分者人数或评分所依据的标准数;\n",
" <li>S 每个被评对象所评等级之和Ri与所有这些和的平均数的离差平方和\n",
"$$\n",
"S=\\sum^n_{i=1}(R_i-\\overline{R_i})^2\n",
"= \\sum^n_{i=1}R^2_i-\\frac{1}{n}(\\sum^n_{i=1}R_i)^2\n",
"$$\n",
" 当评分者意见完成一致时S取得最大值和谐系数是实际求得的S与其最大可能取值的比值故0≤W≤1."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**同一评价者有相同等级评定时,计算公式**\n",
"<br>\n",
"$$\n",
"W = \\frac{S}{\\frac{1}{12}[K^2(N^3-N)-K\\sum^K_{i=1}T_i]}\n",
"$$\n",
"$$\n",
"T_i = \\sum^{mi}_{i=1}(n^3_{ij}-n_{ij})\n",
"$$\n",
"<ul>\n",
" <li>mi为第i个评价者的评定结果中有重复等级的个数。\n",
" <li>nij为第i个评价者的评定结果中第j个重复等级的相同等级数。\n",
" <li>对于评定结果无相同等级的评价者,T=0,因此只须对评定结果有相同等级的评价者计算Ti。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**实例1:同一评价者无相同等级评定时**\n",
"某校开展学生小论文比赛请6位教师对入选的6篇论文评定得奖等级结果如下表所示试计算6位教师评定结果的 kanda和谐系数\n",
"\n",
"|论文编号 | 论文1 | 论文2 | 论文3 | 论文4 | 论文5 | 论文6 ||\n",
"| :----: | :----: | :----: | :----: | :----: | :----: | :----: |:----: |\n",
"| 老师1 | 3 | 1 | 2 | 5 | 4 | 6 | |\n",
"| 老师2 | 2 | 1 | 3 | 4 | 5 | 6 | |\n",
"| 老师3 | 3 | 2 | 1 | 5 | 4 | 6 | |\n",
"| 老师4 | 4 | 1 | 2 | 6 | 3 | 5 | |\n",
"| 老师5 | 3 | 1 | 2 | 6 | 4 | 5 | |\n",
"| 老师6 | 4 | 2 | 1 | 5 | 3 | 6 | |\n",
"| R |19 | 8 | 11 | 31 | 23 |34| ∑K = 126 |\n",
"| R^2 | 361 | 64| 121 |961|529|1156|R^2 = 3192|\n",
"\n",
"由于每个评分老师对6篇论文的评定都无相同的等级\n",
"$$\n",
"S=\\sum^6_{i=1}-\\frac{1}{6}(\\sum^6_{i=1}R_i)^2\n",
"= 3192- \\frac{1}{6} × 126^2 = 546\n",
"$$\n",
"$$\n",
"W = \\frac{S}{\\frac{1}{12}K^2(N^3-N)} = \\frac{546}{\\frac{1}{12}6^2(6^3-6)}\n",
"=\\frac{546}{630}=0.87\n",
"$$\n",
"由W=0.87表明6位老师的评定结果有较大的一致性"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**实例1:同一评价者有相同等级评定时**\n",
"3名专家对6篇心理学论文的评分经等级转换下表所示,试计算专家评定结果的肯德尔和谐系数。\n",
"\n",
"|论文编号 | A | B | C | D | E | F ||\n",
"| :----: | :----: | :----: | :----: | :----: | :----: | :----: |:----: |\n",
"| 甲 | 1 | 4 | 2.5 | 5 | 6 | 2.5 | |\n",
"| 乙 | 2 | 3 | 1 | 5 | 6 | 4 | |\n",
"| 丙 | 1.5 | 3 | 1.5 | 4 | 5.5 | 5.5 | |\n",
"| R |4.5 | 10 | 5 | 14 | 17.5 |12| 63 |\n",
"| R^2 | 20.25 | 100| 25 |196|306.25|144|791.5|\n",
"\n",
"由于甲、丙对6篇论文有相同等级的评定\n",
"甲T = 23-2=6\n",
"丙T = (23 - 2)+(23 - 2) = 12\n",
"$$\n",
"S=\\sum^6_{i=1}-\\frac{1}{6}(\\sum^6_{i=1}R_i)^2\n",
"=791.5-\\frac{1}{6} × 63^2 = 130.00\n",
"$$\n",
"$$\n",
"W = \\frac{S}{\\frac{1}{12}[K^2(N^3-N)-K\\sum^K_{i=1}T_i]}\n",
"=\\frac{130}{\\frac{1}{12}[3^2(6^3-6)-3×(6+12)]}\n",
"=\\frac{130}{153} = 0.849\n",
"$$\n",
"由W=0.849可看出专家评定结果有较大的一致性"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**肯德尔和谐系数的显著性检验**\n",
"评分者人数(k)在3-20之间被评者(N)在3-7之间时可查《肯德尔和谐系数(w)显著性临界值表》检验W是否达到显著性水平。若实际计算的S值大于k、N相同的表内临界值则W达到显著水平\n",
"\n",
"当K=6N=6查表得检验水平分别为α=0.01α=0.05的临界值各为S0.01=282.4,S0.05=221.4均小于实算的S=546故W达到显著水平认为6位教师对6篇论文的评定相当一致\n",
"\n",
"当被评者n>7时则可用如下的X2统计量对W是否达到显著水平作检验。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tau 0.6\n",
"p_value 0.23333333333333334\n"
]
}
],
"source": [
"x1 = [10,9,8,7,6]\n",
"x2 = [10,8,9,6,7]\n",
"\n",
"tau, p_value = stats.kendalltau(x1,x2)\n",
"print('tau', tau)\n",
"print('p_value', p_value)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 质量相关分析\n",
"质量相关是指一个变量为质,另一个变量为量,这两个变量之间的相关。如智商、学科分数、身高、体重等是表现为量的变量,男与女、优与劣、及格与不及格等是表现为质的变量。\n",
"\n",
"质与量的相关主要包括二列相关、点二列相关、多系列相关。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 二列相关\n",
"当两个变量都是正态连续变量。其中一个变量被人为地划分成二分变量(如按一定标推将属于正态连续变量的学科考试分数划分成及格与不及格,录取与未录取,把某一体育项目测验结果划分成通过与未通过,达标与末达标,把健康状况划分成好与差,等等),表示这两个变量之间的相关,称为二列相关\n",
"\n",
"**二列相关的使用条件:**\n",
"<ul>\n",
" <li>两个变量都是连续变量,且总体呈正态分布,或总体接近正态分布,至少是单峰对称分布。\n",
" <li>两个变量之间是线性关系。\n",
" <li>二分变量是人为划分的,其分界点应尽量靠近中值。\n",
" <li>样本容量应当大于80。\n",
"</ul>\n",
"$$\n",
"R = \\frac{\\overline{X}_p-\\overline{X}_q}{σ} × \\frac{pq}{Y}\n",
"$$\n",
"\n",
"$$p 表示二分变量中某一类别频数的比率$$\n",
"$$q 表示二分变量中另一类别频数的比率$$\n",
"$$\\overline{X}_p 表示与二分变量中p类别相对应的连续变量的平均数$$\n",
"$$\\overline{X}_q 表示与二分变量中q类别相对应的连续变量的平均数$$\n",
"$$σ 表示连续变量的标准差$$\n",
"$$Y 表示正态曲线下与p相对应的纵线高度$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**二列相关实例:**\n",
"<br>10名考生成绩如下包括总分和一道问答题试求该问答题的区分度6分以上为通过包括6分\n",
"<img src=\"assets/20201121183005.png\" width=\"50%\">\n",
"问答题,被人为的分成两类,通过和不通过,应求二列相关。\n",
"<br>\n",
"$$\n",
"当p=0.6时查正态分布表得到x=0.25\n",
"$$\n",
"$$\n",
"当x=0.25时代入标准正态密度函数Y=\\frac{1}{\\sqrt{2π}}e^{-\\frac{x^2}{x}}\n",
"得到Y=0.3866\n",
"$$\n",
"$$\n",
"\\overline{X}_p = 67.33, \\overline{X}_q=61.25,σ=6.12\n",
"$$\n",
"则可以通过公式计算得到二列相关系数:\n",
"$$\n",
"R=\\frac{\\overline{X}_p-\\overline{X}_q}{σ}×\\frac{pq}{Y}\n",
"=\\frac{67.33-61.25}{6.12}×\\frac{0.6×0.4}{0.3866} ≈0.62\n",
"$$\n",
"区分度较高"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 点二列相关\n",
"当两个变量其中一个是正态连续性变量,另一个是真正的二分名义变量(例如,男与女,已婚和未婚,色肓与非色盲,生与死,等等),这时,表示这两个变量之间的相关,称为点二列相关。\n",
"$$\n",
"R = \\frac{\\overline{X}_p-\\overline{X}_q}{σ} × \\sqrt{pq}\n",
"$$\n",
"\n",
"$$p表示二分变量中某一类别频数的比率$$\n",
"$$q 表示二分变量中另一类别频数的比率$$\n",
"$$\\overline{X}_p 表示与二分变量中p类别相对应的连续变量的平均数$$\n",
"$$\\overline{X}_q 表示与二分变量中q类别相对应的连续变量的平均数$$\n",
"$$σ 表示连续变量的标准差$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**点二列相关实例:**\n",
"<br>有50道选择题,每题2分,有20人的总成绩和第五题的情况,第五题与总分的相关程度如亻\n",
"<img src=\"assets/20201121183858.png\" width=\"50%\">\n",
"p(答对学生的比例) = 10/20=0.5q=1-p=0.5\n",
"$$\n",
"\\overline{X}_p=88.4, \\overline{X}_q=74.8, σ=8.66\n",
"$$\n",
"$$\n",
"R = \\frac{\\overline{X}_p-\\overline{X}_q}{σ} × \\sqrt{pq}\n",
"= \\frac{88.4-74.8}{8.66}\\sqrt{0.5×0.8} = 0.785\n",
"$$\n",
"相关系数较高,第五题的情况与总分有一致性(区分度较高)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PointbiserialrResult(correlation=0.7849870641173371, pvalue=4.145927973490392e-05)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#拿上面的实例对了就是1错了是0x是第5题的选答情况y是分数\n",
"x = [1,0,0,0,0,0,0,1,1,1,1,0,1,1,1,1,1,0,0,0]\n",
"y = [84,82,76,60,72,74,76,84,88,90,78,80,92,94,96,88,90,78,76,74]\n",
"stats.pointbiserialr(x,y) #可以看到相关系数值是0.7849,和上面的计算结果一致"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 品质相关分析\n",
"两个变量都是按质划分成几种类别,表示这两个变量之间的相关称为品质相关。\n",
"\n",
"如,一个变量按性别分成男与女,另一个变量按学科成绩分成及格与不及格;又如,一个变量按学校类别分成重点及非重点,另一个变量按学科成绩分成优、良、中、差,等等"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**列联相关系数**\n",
"\n",
" 当两个变量均被分成两个以上类别,或其中一个变量被分成两个以上类别这两个变量之问的相关程度可用列联相关系数( contingency coefficient)来测度。如行政人员、现任教师、学生家长与对现有考试制度持赞同、不置可否、反对意见有无相关。\n",
" \n",
" 假设变量x被分成a个类别,y被分成b个类别,而且a和b至少有一个大于2,这时变量x与变量y的列联相关系数记为0记m。为观察数据属于变量x的第1类别(=1,2,…,a)、变量y的第类b)的频数。记m为观察数据属于变量x的第i类别i=12...a、变量y的第j类别j=12...b的频数。记\n",
"$$\n",
"a_i = \\sum^b_{i=1}m(i=1,2,...,m)\n",
"$$\n",
"$$\n",
"b_i = \\sum^a_{i=1}m(j=1,2,...,m)\n",
"$$\n",
"$$\n",
" 构造X^2 = N(\\sum \\sum \\frac{m^2}{a_ib_j}-1),其中N= \\sum \\sum m这样得到列联相关系数\n",
"$$\n",
"$$\n",
"C的计算公式C = \\sqrt{\\frac{x^2}{N+x^2}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**例子:**\n",
"2531名学生和教室进行了抽样调查计算调查对象和态度之间的列联相关系数并进行统计显著检验\n",
"<img src=\"assets/20201121200013.png\" width=\"50%\">\n",
"解根据公式计算X^2\n",
"$$\n",
"X^2 = 2531(\\frac{446^2}{981*977}\\frac{212^2}{730*977}+...+\\frac{177^2}{820*764})\n",
"≈130.02\n",
"$$\n",
"$$\n",
"C=\\sqrt{\\frac{X^2}{N+X^2}}=\\sqrt{\\frac{130.2}{2531+130.2}}≈0.221\n",
"$$\n",
"$$\n",
"查X^2分布表得到临界值X^2_{0.01}(4)=12.277\n",
"$$\n",
"$$\n",
"X^2=130.02>12.277所以求得的列联系数C=0.221具有统计显著意义。\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"还有等于2的是用另外一套公式"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 偏相关分析\n",
"在多要素所构成的地理系统中,先不考虑其它要素的影响,而单独研究两个要素之间的相互关系的密切程度,这称为偏相关。用以度量偏相关程度的统计量,称为偏相关系数\n",
"\n",
"在分析变量x1和x2之间的净相关时,当控制了变量x3的线性作用后,x1和x2之间的一阶偏相关系数定义为\n",
"$$\n",
"r_{12.3} = \\frac{r_{12}-r_{13}r_{23}}{\\sqrt{(1-r_{13}^2)(1-r_{23}^2)}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对于某四个地理要素x1,x2,X3,×4的23个样本数据,经过计算得到了如下的单相关系数矩阵:\n",
"<img src=\"assets/20201121202149.png\" width=\"50%\">\n",
"计算可得部分偏相关系数\n",
"<img src=\"assets/20201121202207.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**偏相关系数的性质**\n",
"<ul>\n",
" <li>偏相关系数分布的范围在-1到1之间\n",
" <li>偏相关系数的绝对值越大,表示其偏相关程度越大\n",
" <li>偏相关系数的绝对值必小于或最多等于由同一系列资料所求得的复相关系数,即R1*23≥|r12*3|\n",
"</ul>\n",
"\n",
"**偏相关系数的显著性检验**\n",
"\n",
"$$\n",
"t=\\frac{r\\sqrt{r-k-2}}{\\sqrt{1-r^2}},服从t(n-k-2)分布\n",
"$$\n",
"<ul>\n",
"<li>n 是样本容量\n",
"<li>k 是剔除了的变量数\n",
"<li>r 是偏相关系数\n",
"</ul>\n",
"当有3个要素时,有三个偏相关系数,称为一级偏相关系数\n",
"\n",
"当有4个要素时,则有六个偏相关系数,则称他们为二级偏相关系数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 复相关系数\n",
"<ul>\n",
"<li>反映几个要素与某一个要素之间的复相关程度。复相关系数介于0到1之间。\n",
"<li>复相关系数越大,则表明要素(变量)之间的相关程度越密切。复相关系数为1,表示完全相关:复相关系数为0,表示完全无关。\n",
"<li>复相关系数必大于或至少等于单相关系数的绝对值。\n",
"</ul>\n",
"\n",
"测定一个变量y当有两个自变量时\n",
"$$\n",
"R_{y.12}=\\sqrt{1-(1-r^2_{y1})(1-r^2_{y2.1})}\n",
"$$\n",
"当有三个自变量时:\n",
"$$\n",
"R_{y.123}=\\sqrt{1-(1-r^2_{y1})(1-r^2_{y2.1})(1-r^2_{y3.12})}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**实例:**\n",
"\n",
"在上例中,若以x4为因变量,x1,x2,x3为自变量,试计算x4与x1,x2,x3之间的复相关系数\n",
"$$\n",
"R_{4.123}=\\sqrt{1-(1-r^2_{41})(1-r^2_{42.1})(1-r^2_{43.12})}\n",
"$$\n",
"$$\n",
"=\\sqrt{1-(1-0.579^2)(1-0.956^2)(1-0.337^2)} = 0.974\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}