diff --git a/notebook_必备数学基础/相关分析/assets/20201121183005.png b/notebook_必备数学基础/相关分析/assets/20201121183005.png new file mode 100644 index 0000000..2902c37 Binary files /dev/null and b/notebook_必备数学基础/相关分析/assets/20201121183005.png differ diff --git a/notebook_必备数学基础/相关分析/assets/20201121183858.png b/notebook_必备数学基础/相关分析/assets/20201121183858.png new file mode 100644 index 0000000..db9e339 Binary files /dev/null and b/notebook_必备数学基础/相关分析/assets/20201121183858.png differ diff --git a/notebook_必备数学基础/相关分析/相关分析.ipynb b/notebook_必备数学基础/相关分析/相关分析.ipynb index c44148c..82a1ece 100644 --- a/notebook_必备数学基础/相关分析/相关分析.ipynb +++ b/notebook_必备数学基础/相关分析/相关分析.ipynb @@ -181,7 +181,7 @@ "\\frac{-300.91}{\\sqrt{250.55}\\sqrt{1508.34}}\n", "$$\n", "$$\n", - "= \\frac{-300.91}{15.83*38.84} = -0.4895\n", + "= \\frac{-300.91}{15.83×38.84} = -0.4895\n", "$$\n", "计算结果表明,伦敦市的月平均气温(t)与降水量(p)呈负相关,即异向相关" ] @@ -421,7 +421,7 @@ "计算等级相关系数\n", "$$\n", "r_R = 1-\\frac{2\\sum D^2}{n(n^2-1)}\n", - "= 1-\\frac{6*18}{10(10^2-1)}\n", + "= 1-\\frac{6×18}{10(10^2-1)}\n", "=0.891\n", "$$\n", "**等级相关系数的显著性检验**\n", @@ -503,7 +503,7 @@ "由于每个评分老师对6篇论文的评定都无相同的等级:\n", "$$\n", "S=\\sum^6_{i=1}-\\frac{1}{6}(\\sum^6_{i=1}R_i)^2\n", - "= 3192- \\frac{1}{6} * 126^2 = 546\n", + "= 3192- \\frac{1}{6} × 126^2 = 546\n", "$$\n", "$$\n", "W = \\frac{S}{\\frac{1}{12}K^2(N^3-N)} = \\frac{546}{\\frac{1}{12}6^2(6^3-6)}\n", @@ -532,11 +532,11 @@ "丙T = (23 - 2)+(23 - 2) = 12\n", "$$\n", "S=\\sum^6_{i=1}-\\frac{1}{6}(\\sum^6_{i=1}R_i)^2\n", - "=791.5-\\frac{1}{6} * 63^2 = 130.00\n", + "=791.5-\\frac{1}{6} × 63^2 = 130.00\n", "$$\n", "$$\n", "W = \\frac{S}{\\frac{1}{12}[K^2(N^3-N)-K\\sum^K_{i=1}T_i]}\n", - "=\\frac{130}{\\frac{1}{12}[3^2(6^3-6)-3*(6+12)]}\n", + "=\\frac{130}{\\frac{1}{12}[3^2(6^3-6)-3×(6+12)]}\n", "=\\frac{130}{153} = 0.849\n", "$$\n", "由W=0.849可看出专家评定结果有较大的一致性" @@ -577,6 +577,127 @@ "print('p_value', p_value)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 质量相关分析\n", + "质量相关是指一个变量为质,另一个变量为量,这两个变量之间的相关。如智商、学科分数、身高、体重等是表现为量的变量,男与女、优与劣、及格与不及格等是表现为质的变量。\n", + "\n", + "质与量的相关主要包括二列相关、点二列相关、多系列相关。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 二列相关\n", + "当两个变量都是正态连续变量。其中一个变量被人为地划分成二分变量(如按一定标推将属于正态连续变量的学科考试分数划分成及格与不及格,录取与未录取,把某一体育项目测验结果划分成通过与未通过,达标与末达标,把健康状况划分成好与差,等等),表示这两个变量之间的相关,称为二列相关\n", + "\n", + "**二列相关的使用条件:**\n", + "\n", + "$$\n", + "R = \\frac{\\overline{X}_p-\\overline{X}_q}{σ} × \\frac{pq}{Y}\n", + "$$\n", + "\n", + "$$p 表示二分变量中某一类别频数的比率$$\n", + "$$q 表示二分变量中另一类别频数的比率$$\n", + "$$\\overline{X}_p 表示与二分变量中p类别相对应的连续变量的平均数$$\n", + "$$\\overline{X}_q 表示与二分变量中q类别相对应的连续变量的平均数$$\n", + "$$σ 表示连续变量的标准差$$\n", + "$$Y 表示正态曲线下与p相对应的纵线高度$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**二列相关实例:**\n", + "
10名考生成绩如下,包括总分和一道问答题,试求该问答题的区分度(6分以上为通过,包括6分)\n", + "\n", + "问答题,被人为的分成两类,通过和不通过,应求二列相关。\n", + "
\n", + "$$\n", + "当p=0.6时,查正态分布表得到:x=0.25\n", + "$$\n", + "$$\n", + "当x=0.25时,代入标准正态密度函数Y=\\frac{1}{\\sqrt{2π}}e^{-\\frac{x^2}{x}}\n", + "得到:Y=0.3866\n", + "$$\n", + "$$\n", + "\\overline{X}_p = 67.33, \\overline{X}_q=61.25,σ=6.12\n", + "$$\n", + "则可以通过公式计算得到二列相关系数:\n", + "$$\n", + "R=\\frac{\\overline{X}_p-\\overline{X}_q}{σ}×\\frac{pq}{Y}\n", + "=\\frac{67.33-61.25}{6.12}×\\frac{0.6×0.4}{0.3866} ≈0.62\n", + "$$\n", + "区分度较高" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 点二列相关\n", + "当两个变量其中一个是正态连续性变量,另一个是真正的二分名义变量(例如,男与女,已婚和未婚,色肓与非色盲,生与死,等等),这时,表示这两个变量之间的相关,称为点二列相关。\n", + "$$\n", + "R = \\frac{\\overline{X}_p-\\overline{X}_q}{σ} × \\sqrt{pq}\n", + "$$\n", + "\n", + "$$p表示二分变量中某一类别频数的比率$$\n", + "$$q 表示二分变量中另一类别频数的比率$$\n", + "$$\\overline{X}_p 表示与二分变量中p类别相对应的连续变量的平均数$$\n", + "$$\\overline{X}_q 表示与二分变量中q类别相对应的连续变量的平均数$$\n", + "$$σ 表示连续变量的标准差$$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**点二列相关实例:**\n", + "
有50道选择题,每题2分,有20人的总成绩和第五题的情况,第五题与总分的相关程度如亻\n", + "\n", + "p(答对学生的比例) = 10/20=0.5,q=1-p=0.5\n", + "$$\n", + "\\overline{X}_p=88.4, \\overline{X}_q=74.8, σ=8.66\n", + "$$\n", + "$$\n", + "R = \\frac{\\overline{X}_p-\\overline{X}_q}{σ} × \\sqrt{pq}\n", + "= \\frac{88.4-74.8}{8.66}\\sqrt{0.5×0.8} = 0.785\n", + "$$\n", + "相关系数较高,第五题的情况与总分有一致性(区分度较高)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PointbiserialrResult(correlation=0.7849870641173371, pvalue=4.145927973490392e-05)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#拿上面的实例,对了就是1,错了是0,x是第5题的选答情况,y是分数\n", + "x = [1,0,0,0,0,0,0,1,1,1,1,0,1,1,1,1,1,0,0,0]\n", + "y = [84,82,76,60,72,74,76,84,88,90,78,80,92,94,96,88,90,78,76,74]\n", + "stats.pointbiserialr(x,y) #可以看到相关系数值是0.7849,和上面的计算结果一致" + ] + }, { "cell_type": "code", "execution_count": null,