Add 偏相关和复相关

pull/2/head
benjas 4 years ago
parent cc15d00762
commit 5bb7ad3c0e

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

@ -698,6 +698,154 @@
"stats.pointbiserialr(x,y) #可以看到相关系数值是0.7849,和上面的计算结果一致"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 品质相关分析\n",
"两个变量都是按质划分成几种类别,表示这两个变量之间的相关称为品质相关。\n",
"\n",
"如,一个变量按性别分成男与女,另一个变量按学科成绩分成及格与不及格;又如,一个变量按学校类别分成重点及非重点,另一个变量按学科成绩分成优、良、中、差,等等"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**列联相关系数**\n",
"\n",
" 当两个变量均被分成两个以上类别,或其中一个变量被分成两个以上类别这两个变量之问的相关程度可用列联相关系数( contingency coefficient)来测度。如行政人员、现任教师、学生家长与对现有考试制度持赞同、不置可否、反对意见有无相关。\n",
" \n",
" 假设变量x被分成a个类别,y被分成b个类别,而且a和b至少有一个大于2,这时变量x与变量y的列联相关系数记为0记m。为观察数据属于变量x的第1类别(=1,2,…,a)、变量y的第类b)的频数。记m为观察数据属于变量x的第i类别i=12...a、变量y的第j类别j=12...b的频数。记\n",
"$$\n",
"a_i = \\sum^b_{i=1}m(i=1,2,...,m)\n",
"$$\n",
"$$\n",
"b_i = \\sum^a_{i=1}m(j=1,2,...,m)\n",
"$$\n",
"$$\n",
" 构造X^2 = N(\\sum \\sum \\frac{m^2}{a_ib_j}-1),其中N= \\sum \\sum m这样得到列联相关系数\n",
"$$\n",
"$$\n",
"C的计算公式C = \\sqrt{\\frac{x^2}{N+x^2}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**例子:**\n",
"2531名学生和教室进行了抽样调查计算调查对象和态度之间的列联相关系数并进行统计显著检验\n",
"<img src=\"assets/20201121200013.png\" width=\"50%\">\n",
"解根据公式计算X^2\n",
"$$\n",
"X^2 = 2531(\\frac{446^2}{981*977}\\frac{212^2}{730*977}+...+\\frac{177^2}{820*764})\n",
"≈130.02\n",
"$$\n",
"$$\n",
"C=\\sqrt{\\frac{X^2}{N+X^2}}=\\sqrt{\\frac{130.2}{2531+130.2}}≈0.221\n",
"$$\n",
"$$\n",
"查X^2分布表得到临界值X^2_{0.01}(4)=12.277\n",
"$$\n",
"$$\n",
"X^2=130.02>12.277所以求得的列联系数C=0.221具有统计显著意义。\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"还有等于2的是用另外一套公式"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 偏相关分析\n",
"在多要素所构成的地理系统中,先不考虑其它要素的影响,而单独研究两个要素之间的相互关系的密切程度,这称为偏相关。用以度量偏相关程度的统计量,称为偏相关系数\n",
"\n",
"在分析变量x1和x2之间的净相关时,当控制了变量x3的线性作用后,x1和x2之间的一阶偏相关系数定义为\n",
"$$\n",
"r_{12.3} = \\frac{r_{12}-r_{13}r_{23}}{\\sqrt{(1-r_{13}^2)(1-r_{23}^2)}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"对于某四个地理要素x1,x2,X3,×4的23个样本数据,经过计算得到了如下的单相关系数矩阵:\n",
"<img src=\"assets/20201121202149.png\" width=\"50%\">\n",
"计算可得部分偏相关系数\n",
"<img src=\"assets/20201121202207.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**偏相关系数的性质**\n",
"<ul>\n",
" <li>偏相关系数分布的范围在-1到1之间\n",
" <li>偏相关系数的绝对值越大,表示其偏相关程度越大\n",
" <li>偏相关系数的绝对值必小于或最多等于由同一系列资料所求得的复相关系数,即R1*23≥|r12*3|\n",
"</ul>\n",
"\n",
"**偏相关系数的显著性检验**\n",
"\n",
"$$\n",
"t=\\frac{r\\sqrt{r-k-2}}{\\sqrt{1-r^2}},服从t(n-k-2)分布\n",
"$$\n",
"<ul>\n",
"<li>n 是样本容量\n",
"<li>k 是剔除了的变量数\n",
"<li>r 是偏相关系数\n",
"</ul>\n",
"当有3个要素时,有三个偏相关系数,称为一级偏相关系数\n",
"\n",
"当有4个要素时,则有六个偏相关系数,则称他们为二级偏相关系数"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 复相关系数\n",
"<ul>\n",
"<li>反映几个要素与某一个要素之间的复相关程度。复相关系数介于0到1之间。\n",
"<li>复相关系数越大,则表明要素(变量)之间的相关程度越密切。复相关系数为1,表示完全相关:复相关系数为0,表示完全无关。\n",
"<li>复相关系数必大于或至少等于单相关系数的绝对值。\n",
"</ul>\n",
"\n",
"测定一个变量y当有两个自变量时\n",
"$$\n",
"R_{y.12}=\\sqrt{1-(1-r^2_{y1})(1-r^2_{y2.1})}\n",
"$$\n",
"当有三个自变量时:\n",
"$$\n",
"R_{y.123}=\\sqrt{1-(1-r^2_{y1})(1-r^2_{y2.1})(1-r^2_{y3.12})}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**实例:**\n",
"\n",
"在上例中,若以x4为因变量,x1,x2,x3为自变量,试计算x4与x1,x2,x3之间的复相关系数\n",
"$$\n",
"R_{4.123}=\\sqrt{1-(1-r^2_{41})(1-r^2_{42.1})(1-r^2_{43.12})}\n",
"$$\n",
"$$\n",
"=\\sqrt{1-(1-0.579^2)(1-0.956^2)(1-0.337^2)} = 0.974\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,

Loading…
Cancel
Save