Add 方差分析概述及计算方法

pull/2/head
benjas 5 years ago
parent fed080b133
commit 4253f5313c

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

@ -0,0 +1,272 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 方差分析"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 方差分析概述\n",
"检验多个总体均值是否相等,通过分析察数据的误差判断各总体均值是否相等\n",
"<img src=\"assets/1606031210722.png\" width=\"50%\">\n",
"下图,所有的样本都在一个相似的正态分布区间\n",
"<img src=\"assets/1606031429856.png\" width=\"30%\">\n",
"下图,所有的样本都是正态分布,但不在同一分布区间\n",
"<img src=\"assets/1606031463635.png\" width=\"30%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**实例:**\n",
"\n",
"为了对几个行业的服务消费者协会在四个行业分别抽取了不同的企业作为样本。最近一年中消费者对总共23家企业投诉的次数如下表\n",
"<img src=\"assets/20201122155647.png\" width=\"50%\">\n",
"**要做的事:**\n",
"\n",
"分析四个行业之间的服务质量是否有显著差异,也就是要判断“行业”对“投诉次数”是否有显著影响\n",
"\n",
"如果它们的均值相等,就意味着“行业”对投诉次数是没有影响的,即它们之间的服务质量没有显著差异:如果均值不全相等,则意味着“行业”对投诉次数是有影响的,它们之间的服务质量有显著差异"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**相关概念:**\n",
"<ul>\n",
" <li>因素或因子( factor):所要检验的对象,要分析行业对投诉次数是否有影响,行业是要检验的因素或因子\n",
" <li>水平或处理( treatment:因素的不同表现,即每个自变量的不同取值称为因素的水平\n",
" <li>观察值:在每个因素水平下得到的样本值,每个行业被投诉的次数就是观察值\n",
" <li>试验:这里只涉及一个因素,因此称为单因素四水平的试验\n",
" <li>总体:因素的每一个水平可以看作是一个总体,比如零售业、旅游业、航空公司、家电制造业可以看作是四个总体\n",
" <li>样本数据:被投诉次数可以看作是从这四个总体中抽取的样本数据\n",
"</ul>\n",
"\n",
"**散点图观察**\n",
"<img src=\"assets/20201122164855.png\" width=\"50%\">\n",
"<ul>\n",
" <li>不同行业被投诉的次数是有明显差异的\n",
" <li>即使是在同一个行业,不同企业被投诉的次数也明显不同\n",
" <li>家电制造也被投诉的次数较高,航空公司被投诉的次数较低\n",
" <li>行业与被投诉次数之间有一定的关系\n",
"</ul>\n",
"\n",
"**但是**\n",
"<ul>\n",
" <li>仅从散点图上观察还不能提供充分的证据证明不同行业被投诉的次数之间有显著差异\n",
" <li>这种差异也可能是由于抽样的随机性所造成的\n",
" <li>需要有更准确的方法来检验这种差异是否显著,也就是进行方差分析\n",
" <li>之所以叫方差分析,因为虽然我们感兴趣的是均值,但在判断均值之间是否有差异时则需要借助于方差\n",
"</ul>\n",
"\n",
"### 基本思想:\n",
"<ul>\n",
" <li>比较两类误差,以检验均值是否相等\n",
" <li>比较的基础是方差比\n",
" <li>如果系统(处理)误差显著地不冋于随机误差,则均值就是不相等的;反之,均值就是相等的\n",
"</ul>\n",
"\n",
"### 随机误差:\n",
"<ul>\n",
" <li>因素的同一水平(总体)下,样本各观察值之间的差异\n",
" <li>比如,同一行业下不同企业被投诉次数是不同的\n",
" <li>这种差异可以看成是随机因素的影响,称为随机误差\n",
"</ul>\n",
"\n",
"### 系统误差:\n",
"<ul>\n",
" <li>因素的不同水平(不同总体)下,各观察值之间的差异\n",
" <li>比如,不同行业之间的被投诉次数之间的差异\n",
" <li>这种差异可能是由于抽样的随机性所造成的,也可能是由于行业本身所造成的,后者所形成的误差是由系统性因素造成的,称为系统误差\n",
"</ul>\n",
"\n",
"### 组内方差:\n",
"<ul>\n",
" <li>因素的同一水平(同一个总体)下样本数据的方差\n",
" <li>比如,零售业被投诉次数的方差\n",
" <li>组内方差只包含随机误差\n",
"</ul>\n",
"\n",
"### 组间方差:\n",
"<ul>\n",
" <li>因素的不同水平(不同总体)下各样本之间的方差\n",
" <li>比如,四个行业被投诉次数之间的方差\n",
" <li>组间方差既包括随机误差,也包括系统误差\n",
"</ul>\n",
"\n",
"### 方差的比较:\n",
"<ul>\n",
" <li>若不同行业对投诉次数没有影响,则组间误差中只包含随机误差,没有系统误差。这时,组间误差与组内误差经过平均后的数值就应该很接近,它们的比值就会接近1\n",
" <li>,若不同行业对投诉次数有影响,在组间误差中除了包含随机误差外,还会包含有系统误差,这时组间误差平均后的数值就会大于组内误差平均后的数值,它们之间的比值就会大于1\n",
" <li>这个比值大到某种程度时,就可以说不同水平之间存在着显著差异,也就是自变量对因变量有影响\n",
" <li>判断行业对投诉次数是否有显著影响,实际上也就是检验被投诉次数的差异主要是由于什么原因所引起的。如果这种差异主要是系统误差,说明不同行业对投诉次数有显著影响\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 方差分析计算方法\n",
"\n",
"### 方差分析的前提:\n",
"\n",
"**每个总体都应服从正态分布**\n",
"<ul>\n",
" <li>对于因素的每一个水平,其观察值是来自服从正态分布总体的简单随机样本\n",
" <li>比如,每个行业被投诉的次数必需服从正态分布\n",
"</ul>\n",
"\n",
"**各个总体的方差必须相同**\n",
"<ul>\n",
" <li>各组观察数据是从具有相同方差的总体中抽取的\n",
" <li>比如,四个行业被投诉次数的方差都相等\n",
"</ul>\n",
"\n",
"**观察值是独立**\n",
"<ul>\n",
" <li>比如,每个行业被投诉的次数与其他行业被投诉的次数独立\n",
"</ul>\n",
"\n",
"**在上述假定条件下,判断行业对投诉次数是否有显著影响,实际上也就是检验具有同方差的四个正态总体的均值是否相等**\n",
"\n",
"**原假设成立,即H0:μ1=μ2=μ3=μ4**\n",
"<br>四个行业被投诉次数的均值都相等意味着每个样本都来自均值为μ、方差为σ^2的同一正态总体\n",
"<img src=\"assets/20201122170827.png\" width=\"50%\">\n",
"**备择假设成立,即H1:μ1=μ2=μ3=μ4不完全相等**\n",
"<br>至少有一个总体的均值是不同的,四个样本分别来自均值不同的四个正态总体\n",
"<img src=\"assets/20201122170905.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 单因素方差分析\n",
"模型中有一个自变量(因素)和一个观测变量其实就是关干在一个影响因素的不同水平下,观测变量均值差异的显著性检验。\n",
"<img src=\"assets/20201122171118.png\" width=\"50%\">\n",
"**提出假设**\n",
"<br>HO: μ1= μ2=...=μk,自变量对因变量没有显著影响\n",
"<br>即H1:μ1μ2...u4不完全相等,自变量对因变量有显著影响\n",
"\n",
"拒绝原假设,只表明至少有两个总体的均值不相等,并不意味着所有的均值都不相等\n",
"\n",
"#### 检验的统计量\n",
"<ul>\n",
" <li>水平的均值\n",
" <li>全部观察值的总均值\n",
" <li>误差平方和\n",
" <li>均方(MS)\n",
"</ul>\n",
"\n",
"**水平的均值:**\n",
"\n",
"定从第i个总体中抽取一个容量为ni的简单随机样本,第ⅰ个总体的样本均值为该样本的全部观察值总和除以观察值的个数\n",
"<img src=\"assets/20201122171813.png\" width=\"10%\"> (i=1,2,...,k)\n",
"\n",
"式中:ni为第i个总体的样本观察值个数\n",
" xij为第i个总体的第j个观察值\n",
" \n",
"**全部观察值的总均值:**\n",
"\n",
"全部观察值的总和除以观察值的总个数\n",
"<img src=\"assets/20201122172021.png\" width=\"20%\">\n",
"式中n=n1+n2+...+nk\n",
"<img src=\"assets/20201122172125.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**总误差平方和SST**\n",
"\n",
"全部观察值与总平均值的离差平方和,反映全部观察值的离散状况\n",
"<img src=\"assets/20201122172417.png\" width=\"20%\">\n",
"\n",
"\n",
"**水平项平方和SSA**\n",
"\n",
"各组平均值与总平均值的离差平方和,反映各总体的样本均值之间的差异程度,又称组间平方和,该平方和既包括随机误差,也包括系统误差\n",
"<img src=\"assets/20201122172514.png\" width=\"30%\">\n",
"\n",
"**误差项平方和SSE**\n",
"\n",
"每个水平或组的各样本数据与其组平均值的离差平方和,反映每个样本各观察值的离散状况,又称组内平方和,该平方和反映的是随机误差的大小\n",
"<img src=\"assets/20201122172804.png\" width=\"20%\">\n",
"\n",
"**平方和之间的关系**\n",
"\n",
"总离差平方和(SST、误差项离差平方和(SSE)、水平项离差平方和(SSA)之间的关系\n",
"<img src=\"assets/20201122172952.png\" width=\"40%\">\n",
"\n",
"**SST反映全部数据总的误差程度;SSE反映随机误差的大小;SSA反映随机误差和系统误差的大小**\n",
"\n",
"如果原假设成立,则表明没有系统误差,组间平方和SSA除以自由度后的均方与组内平方和SSE和除以自由度后的均方差异就不会太大;如果组间均方显著地大于组内均方,说明各水平(总体)之间的差异不仅有随机误差,还有系统误差,判断因素的水平是否对其观察值有影响,实际上就是比较组间方差与组内方差之间差异的大小\n",
"\n",
"**均方MS**\n",
"\n",
"各误差平方和的大小与观察值的多少有关,为消除观察值多少对误差平方和大小的影响,需要将其平均,这就是均方,也称为方差,计算方法是用误差平方和除以相应的自由度\n",
"\n",
"**各自自由度**\n",
"<ul>\n",
" <li>SST的自由度为n-1,其中n为全部观察值的个数\n",
" <li>SSA的自由度为k-1,其中k为因素水平(总体)的个数\n",
" <li>SSE的自由度为n-k\n",
"</ul>\n",
"\n",
"**F统计量**\n",
"\n",
"将MSA(组间方差,SSA的均方,记为MSA)和MSE(组内方差,SSE的均方,记为MSE)进行对比,即得到所需要的检验统计量F\n",
"<img src=\"assets/20201122173416.png\" width=\"20%\">\n",
"<img src=\"assets/20201122173432.png\" width=\"20%\">\n",
"**F分布**\n",
"<img src=\"assets/20201122173507.png\" width=\"30%\">\n",
"\n",
"根据给定的显著性水平,在F分布表中查找与第一自由度df1=k-1、第二自由度df2=n-k相应的临界值\n",
"<ul>\n",
" <li>若FFα,则拒绝原假设H0,表明均值之间的差异是显著的,所检验的因素对观察值有显著影响\n",
" <li>若FFα,则不拒绝原假设H0,不能认为所检验的因素对观察值有显著影响\n",
"</ul>\n",
"\n",
"**方差分析表:**\n",
"<img src=\"assets/20201122173755.png\" width=\"50%\">"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

@ -0,0 +1,272 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 方差分析"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 方差分析概述\n",
"检验多个总体均值是否相等,通过分析察数据的误差判断各总体均值是否相等\n",
"<img src=\"assets/1606031210722.png\" width=\"50%\">\n",
"下图,所有的样本都在一个相似的正态分布区间\n",
"<img src=\"assets/1606031429856.png\" width=\"30%\">\n",
"下图,所有的样本都是正态分布,但不在同一分布区间\n",
"<img src=\"assets/1606031463635.png\" width=\"30%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**实例:**\n",
"\n",
"为了对几个行业的服务消费者协会在四个行业分别抽取了不同的企业作为样本。最近一年中消费者对总共23家企业投诉的次数如下表\n",
"<img src=\"assets/20201122155647.png\" width=\"50%\">\n",
"**要做的事:**\n",
"\n",
"分析四个行业之间的服务质量是否有显著差异,也就是要判断“行业”对“投诉次数”是否有显著影响\n",
"\n",
"如果它们的均值相等,就意味着“行业”对投诉次数是没有影响的,即它们之间的服务质量没有显著差异:如果均值不全相等,则意味着“行业”对投诉次数是有影响的,它们之间的服务质量有显著差异"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**相关概念:**\n",
"<ul>\n",
" <li>因素或因子( factor):所要检验的对象,要分析行业对投诉次数是否有影响,行业是要检验的因素或因子\n",
" <li>水平或处理( treatment:因素的不同表现,即每个自变量的不同取值称为因素的水平\n",
" <li>观察值:在每个因素水平下得到的样本值,每个行业被投诉的次数就是观察值\n",
" <li>试验:这里只涉及一个因素,因此称为单因素四水平的试验\n",
" <li>总体:因素的每一个水平可以看作是一个总体,比如零售业、旅游业、航空公司、家电制造业可以看作是四个总体\n",
" <li>样本数据:被投诉次数可以看作是从这四个总体中抽取的样本数据\n",
"</ul>\n",
"\n",
"**散点图观察**\n",
"<img src=\"assets/20201122164855.png\" width=\"50%\">\n",
"<ul>\n",
" <li>不同行业被投诉的次数是有明显差异的\n",
" <li>即使是在同一个行业,不同企业被投诉的次数也明显不同\n",
" <li>家电制造也被投诉的次数较高,航空公司被投诉的次数较低\n",
" <li>行业与被投诉次数之间有一定的关系\n",
"</ul>\n",
"\n",
"**但是**\n",
"<ul>\n",
" <li>仅从散点图上观察还不能提供充分的证据证明不同行业被投诉的次数之间有显著差异\n",
" <li>这种差异也可能是由于抽样的随机性所造成的\n",
" <li>需要有更准确的方法来检验这种差异是否显著,也就是进行方差分析\n",
" <li>之所以叫方差分析,因为虽然我们感兴趣的是均值,但在判断均值之间是否有差异时则需要借助于方差\n",
"</ul>\n",
"\n",
"### 基本思想:\n",
"<ul>\n",
" <li>比较两类误差,以检验均值是否相等\n",
" <li>比较的基础是方差比\n",
" <li>如果系统(处理)误差显著地不冋于随机误差,则均值就是不相等的;反之,均值就是相等的\n",
"</ul>\n",
"\n",
"### 随机误差:\n",
"<ul>\n",
" <li>因素的同一水平(总体)下,样本各观察值之间的差异\n",
" <li>比如,同一行业下不同企业被投诉次数是不同的\n",
" <li>这种差异可以看成是随机因素的影响,称为随机误差\n",
"</ul>\n",
"\n",
"### 系统误差:\n",
"<ul>\n",
" <li>因素的不同水平(不同总体)下,各观察值之间的差异\n",
" <li>比如,不同行业之间的被投诉次数之间的差异\n",
" <li>这种差异可能是由于抽样的随机性所造成的,也可能是由于行业本身所造成的,后者所形成的误差是由系统性因素造成的,称为系统误差\n",
"</ul>\n",
"\n",
"### 组内方差:\n",
"<ul>\n",
" <li>因素的同一水平(同一个总体)下样本数据的方差\n",
" <li>比如,零售业被投诉次数的方差\n",
" <li>组内方差只包含随机误差\n",
"</ul>\n",
"\n",
"### 组间方差:\n",
"<ul>\n",
" <li>因素的不同水平(不同总体)下各样本之间的方差\n",
" <li>比如,四个行业被投诉次数之间的方差\n",
" <li>组间方差既包括随机误差,也包括系统误差\n",
"</ul>\n",
"\n",
"### 方差的比较:\n",
"<ul>\n",
" <li>若不同行业对投诉次数没有影响,则组间误差中只包含随机误差,没有系统误差。这时,组间误差与组内误差经过平均后的数值就应该很接近,它们的比值就会接近1\n",
" <li>,若不同行业对投诉次数有影响,在组间误差中除了包含随机误差外,还会包含有系统误差,这时组间误差平均后的数值就会大于组内误差平均后的数值,它们之间的比值就会大于1\n",
" <li>这个比值大到某种程度时,就可以说不同水平之间存在着显著差异,也就是自变量对因变量有影响\n",
" <li>判断行业对投诉次数是否有显著影响,实际上也就是检验被投诉次数的差异主要是由于什么原因所引起的。如果这种差异主要是系统误差,说明不同行业对投诉次数有显著影响\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 方差分析计算方法\n",
"\n",
"### 方差分析的前提:\n",
"\n",
"**每个总体都应服从正态分布**\n",
"<ul>\n",
" <li>对于因素的每一个水平,其观察值是来自服从正态分布总体的简单随机样本\n",
" <li>比如,每个行业被投诉的次数必需服从正态分布\n",
"</ul>\n",
"\n",
"**各个总体的方差必须相同**\n",
"<ul>\n",
" <li>各组观察数据是从具有相同方差的总体中抽取的\n",
" <li>比如,四个行业被投诉次数的方差都相等\n",
"</ul>\n",
"\n",
"**观察值是独立**\n",
"<ul>\n",
" <li>比如,每个行业被投诉的次数与其他行业被投诉的次数独立\n",
"</ul>\n",
"\n",
"**在上述假定条件下,判断行业对投诉次数是否有显著影响,实际上也就是检验具有同方差的四个正态总体的均值是否相等**\n",
"\n",
"**原假设成立,即H0:μ1=μ2=μ3=μ4**\n",
"<br>四个行业被投诉次数的均值都相等意味着每个样本都来自均值为μ、方差为σ^2的同一正态总体\n",
"<img src=\"assets/20201122170827.png\" width=\"50%\">\n",
"**备择假设成立,即H1:μ1=μ2=μ3=μ4不完全相等**\n",
"<br>至少有一个总体的均值是不同的,四个样本分别来自均值不同的四个正态总体\n",
"<img src=\"assets/20201122170905.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 单因素方差分析\n",
"模型中有一个自变量(因素)和一个观测变量其实就是关干在一个影响因素的不同水平下,观测变量均值差异的显著性检验。\n",
"<img src=\"assets/20201122171118.png\" width=\"50%\">\n",
"**提出假设**\n",
"<br>HO: μ1= μ2=...=μk,自变量对因变量没有显著影响\n",
"<br>即H1:μ1μ2...u4不完全相等,自变量对因变量有显著影响\n",
"\n",
"拒绝原假设,只表明至少有两个总体的均值不相等,并不意味着所有的均值都不相等\n",
"\n",
"#### 检验的统计量\n",
"<ul>\n",
" <li>水平的均值\n",
" <li>全部观察值的总均值\n",
" <li>误差平方和\n",
" <li>均方(MS)\n",
"</ul>\n",
"\n",
"**水平的均值:**\n",
"\n",
"定从第i个总体中抽取一个容量为ni的简单随机样本,第ⅰ个总体的样本均值为该样本的全部观察值总和除以观察值的个数\n",
"<img src=\"assets/20201122171813.png\" width=\"10%\"> (i=1,2,...,k)\n",
"\n",
"式中:ni为第i个总体的样本观察值个数\n",
" xij为第i个总体的第j个观察值\n",
" \n",
"**全部观察值的总均值:**\n",
"\n",
"全部观察值的总和除以观察值的总个数\n",
"<img src=\"assets/20201122172021.png\" width=\"20%\">\n",
"式中n=n1+n2+...+nk\n",
"<img src=\"assets/20201122172125.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**总误差平方和SST**\n",
"\n",
"全部观察值与总平均值的离差平方和,反映全部观察值的离散状况\n",
"<img src=\"assets/20201122172417.png\" width=\"20%\">\n",
"\n",
"\n",
"**水平项平方和SSA**\n",
"\n",
"各组平均值与总平均值的离差平方和,反映各总体的样本均值之间的差异程度,又称组间平方和,该平方和既包括随机误差,也包括系统误差\n",
"<img src=\"assets/20201122172514.png\" width=\"30%\">\n",
"\n",
"**误差项平方和SSE**\n",
"\n",
"每个水平或组的各样本数据与其组平均值的离差平方和,反映每个样本各观察值的离散状况,又称组内平方和,该平方和反映的是随机误差的大小\n",
"<img src=\"assets/20201122172804.png\" width=\"20%\">\n",
"\n",
"**平方和之间的关系**\n",
"\n",
"总离差平方和(SST、误差项离差平方和(SSE)、水平项离差平方和(SSA)之间的关系\n",
"<img src=\"assets/20201122172952.png\" width=\"40%\">\n",
"\n",
"**SST反映全部数据总的误差程度;SSE反映随机误差的大小;SSA反映随机误差和系统误差的大小**\n",
"\n",
"如果原假设成立,则表明没有系统误差,组间平方和SSA除以自由度后的均方与组内平方和SSE和除以自由度后的均方差异就不会太大;如果组间均方显著地大于组内均方,说明各水平(总体)之间的差异不仅有随机误差,还有系统误差,判断因素的水平是否对其观察值有影响,实际上就是比较组间方差与组内方差之间差异的大小\n",
"\n",
"**均方MS**\n",
"\n",
"各误差平方和的大小与观察值的多少有关,为消除观察值多少对误差平方和大小的影响,需要将其平均,这就是均方,也称为方差,计算方法是用误差平方和除以相应的自由度\n",
"\n",
"**各自自由度**\n",
"<ul>\n",
" <li>SST的自由度为n-1,其中n为全部观察值的个数\n",
" <li>SSA的自由度为k-1,其中k为因素水平(总体)的个数\n",
" <li>SSE的自由度为n-k\n",
"</ul>\n",
"\n",
"**F统计量**\n",
"\n",
"将MSA(组间方差,SSA的均方,记为MSA)和MSE(组内方差,SSE的均方,记为MSE)进行对比,即得到所需要的检验统计量F\n",
"<img src=\"assets/20201122173416.png\" width=\"20%\">\n",
"<img src=\"assets/20201122173432.png\" width=\"20%\">\n",
"**F分布**\n",
"<img src=\"assets/20201122173507.png\" width=\"30%\">\n",
"\n",
"根据给定的显著性水平,在F分布表中查找与第一自由度df1=k-1、第二自由度df2=n-k相应的临界值\n",
"<ul>\n",
" <li>若FFα,则拒绝原假设H0,表明均值之间的差异是显著的,所检验的因素对观察值有显著影响\n",
" <li>若FFα,则不拒绝原假设H0,不能认为所检验的因素对观察值有显著影响\n",
"</ul>\n",
"\n",
"**方差分析表:**\n",
"<img src=\"assets/20201122173755.png\" width=\"50%\">"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -1203,3 +1203,28 @@ https://scikit-learn.org/stable/auto_examples/
##### T检验的基本原理 ##### T检验的基本原理
### 相关分析
notebook已更新markdown待更新
### 方差分析
#### 方差分析概述
检验多个总体均值是否相等,通过分析察数据的误差判断各总体均值是否相等
![1606031210722](assets/1606031210722.png)
下图,所有的样本都在一个相似的正态分布区间
![1606031429856](assets/1606031429856.png)
下图,所有的样本都是正态分布,但不在同一分布区间
![1606031463635](assets/1606031463635.png)
待更新notebook已更新
Loading…
Cancel
Save