Add Python方差分析实例

pull/2/head
benjas 4 years ago
parent ed9edc0a28
commit f85e1956ff

@ -0,0 +1,6 @@
{
"cells": [],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 2
}

@ -250,11 +250,11 @@
"**实例:**\n",
"\n",
"在评价某药物耐受性及安全性的期临床试验中,对符合纳入标准的30名健康自愿者随机分为3组每组10名,各组注射剂量分别为0.5U、1U、2U,观察48小时部分凝血活酶时间(s)试问不同剂量的部分凝血活酶时间有无不同?\n",
"20201122181401.png\n",
"<img src=\"assets/20201122181401.png\" width=\"30%\">\n",
"\n",
"提出假设:H0μ1=μ2=μ3 H1μ1,p2,μ3不全相同显著水平a=0.05\n",
"\n",
"20201122181607.png\n",
"<img src=\"assets/20201122181607.png\" width=\"30%\">\n",
"\n",
"F0.05(2,26)=2.52, F>F0.05(2,26), P<0.05\n",
"拒绝H0。三种不同剂量48小时部分凝血活酶时间不全相同。\n",
@ -268,8 +268,137 @@
"\n",
"**LSD方法**\n",
"\n",
"对k组中的两组的平均数进行比较,当两组样本容量分别为ninj都为时,有\n"
"对k组中的两组的平均数进行比较,当两组样本容量分别为ninj都为时,有\n",
"<img src=\"assets/20201122182006.png\" width=\"20%\">\n",
"<img src=\"assets/20201122182022.png\" width=\"20%\">\n",
"\n",
"则认为μ1与μ2有显著差异\n",
"否则认为它们之间没有显著差异\n",
"\n",
"**实例:颜色对销售额的影响**\n",
"<img src=\"assets/20201122182123.png\" width=\"40%\">\n",
"<img src=\"assets/20201122182214.png\" width=\"50%\">\n",
"\n",
"<img src=\"assets/20201122182231.png\" width=\"30%\">\n",
"<img src=\"assets/20201122182249.png\" width=\"30%\">\n",
"\n",
"<img src=\"assets/20201122182324.png\" width=\"30%\">\n",
"\n",
"依据上面结果可得出影响效果\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 多因素方差分析\n",
"<ul>\n",
" <li>无交互效应的多因素方差分析\n",
" <li>有交互效应的多因素方差分析\n",
"</ul>\n",
"\n",
"**主效应与交互效应**\n",
"<ul>\n",
" <li>主效应( main effect):各个因素对观测变量的单独影响称为主效应\n",
" <li>交互效应( interaction effect):各个因素不同水平的搭配所产生的新的影响称为交互效应\n",
"</ul>\n",
"\n",
"**双因素方差分析的类型**\n",
"<ul>\n",
" <li>双因素方差分析中因素A和B对结果的影响相互独立时称为无交互效应的双因素方差分析\n",
" <li>如果除了A和B对结果的单独影响外还存在交互效应,这时的双因素方差分析称为有交互效应的双因素方差分析\n",
"</ul>\n",
"\n",
"**无交互效应的双因素方差分析模型**\n",
"\n",
"离差平方和的分解\n",
"<img src=\"assets/20201122185025.png\" width=\"40%\">\n",
"\n",
"<img src=\"assets/20201122184827.png\" width=\"20%\">\n",
"\n",
"**有交互效应的双因素方差分析模型**\n",
"\n",
"离差平方和的分解\n",
"<img src=\"assets/20201122184938.png\" width=\"40%\">\n",
"<img src=\"assets/20201122185044.png\" width=\"20%\">\n",
"\n",
"<img src=\"assets/20201122185058.png\" width=\"30%\">\n",
"\n",
"**双因素方差分析的步骤**\n",
"\n",
"**提出假设**\n",
"\n",
"要说明因素A有无显著影响,就是检验如下假设:\n",
"\n",
" Ho:因素A不同水平下观测变量的总体均值无显著差异。\n",
"\n",
" H1:因素A不同水平下观测变量的总体均值存在显著差异。\n",
"\n",
"要说明因素B有无显著影响,就是检验如下假设\n",
" Ho:因素B不同水平下观测变量的总体均值无显著差异\n",
" \n",
" H1:因素B不同水平下观测变量的总体均值存在显著差异。\n",
"\n",
"在有交互效应的双因素方差中,要说明两个因素的交互效应是否显著,还要检验第三组零假设和备择假设\n",
"\n",
" Ho:因素A和因素B的交互效应对观测变量的总体均值无显著差异。\n",
" \n",
" H1:因素A和因素B的交互效应对观测变量的总体均值存在显著差异。\n",
"\n",
"**构造统计量**\n",
"\n",
"在原假设成立的情况下,三个统计量分别服从自由度为(r-1,rs(m-1))、(s-1,rs(m-1))、(r-1)(s-1)rs(m-1)的F分布\n",
"<img src=\"assets/20201122185659.png\" width=\"20%\">\n",
"\n",
"利用原假设和样本数据分别计算3个F统计量的值和其对应的p值对比p值和α,结合原假设作出推断。若p<a,则拒绝关于这个因素的原假设,得出此因素不同水平下观测变量各总体均值存在显著差异的结论。\n",
"\n",
"**实例:**\n",
"\n",
"有四个品牌的彩电在五个地区销售,为分析彩电的品牌(品牌因素)和销售地区(地区因素)对销售量是否有影响,对每个品牌在各地区的销售量取得以下数据。试分品牌和销售地区对彩电的销售量是否有显著影响?(q=0.05)\n",
"<img src=\"assets/20201122185904.png\" width=\"40%\">\n",
"\n",
"提出假设对行因素提出的假设为:\n",
"\n",
" HO: μ1=μ2=...=μi=...=μk(μi为第个水平的均值)H1:μi(i=1,2,…,k)不全相等\n",
"\n",
"对列因素提出的假设为:\n",
"\n",
" HO: H1=μ1=μ2=...=μj=...=μr(mj为第j个水平的均值)H1:μj(j=1,2,...,r)不全相等\n",
" \n",
"**计算各平方和**\n",
"<img src=\"assets/20201122190203.png\" width=\"40%\">\n",
"\n",
"**计算均方**\n",
"\n",
"误差平方和除以相应的自由度\n",
"<ul>\n",
" <li>总离差平方和SST的自由度为kr-1\n",
" <li>行因素的离差平方和SSR的自由度为k-1\n",
" <li>列因素的离差平方和SSc的自由度为r-1\n",
" <li>随机误差平方和SSE的自由度为(k-1)x(-1)\n",
"</ul>\n",
"\n",
"**计算检验统计量(F)**\n",
"\n",
"计算检验统计量(F)\n",
"<img src=\"assets/20201122190305.png\" width=\"20%\">\n",
"\n",
"检验列因素的统计量\n",
"<img src=\"assets/20201122190448.png\" width=\"20%\">\n",
"<img src=\"assets/20201122190458.png\" width=\"50%\">\n",
"\n",
"FA=18.10777>Fα=34903,拒绝原假设H0,说明彩电的品牌对销售量有显著影响\n",
"\n",
"FB=2.100846<Fα=32592,接受原假设H0,说明销售地区对彩电的销售量没有显著影响"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

@ -0,0 +1,224 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 单因素方差分析"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.10150375939849626\n",
"0.9038208903685354\n"
]
}
],
"source": [
"# 呷哺呷哺3个城市不同用户评分\n",
"from scipy.stats import f_oneway\n",
"a = [10,9,9,8,8,7,7,8,8,9] # 3个城市每个城市10个人评价\n",
"b = [10,8,9,8,7,7,7,8,9,9]\n",
"c = [9,9,8,8,8,7,6,9,8,9]\n",
"\n",
"f,p = f_oneway(a,b,c)\n",
"print(f) # 统计量\n",
"print(p) # 概率值"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"不能认为所检验的因素对观测值有显著影响"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 多因素方差分析"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>E</th>\n",
" <th>I</th>\n",
" <th>S</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" E I S\n",
"0 5 5 5\n",
"1 5 4 5\n",
"2 5 3 4\n",
"3 5 2 3\n",
"4 5 1 2"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 呷哺呷哺2因素环境等级食材等级\n",
"from scipy import stats\n",
"import pandas as pd\n",
"import numpy as np\n",
"from statsmodels.formula.api import ols\n",
"from statsmodels.stats.anova import anova_lm\n",
"\n",
"\n",
"environmental = [5,5,5,5,5,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1]\n",
"ingredients = [5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1]\n",
"score = [5,5,4,3,2,5,4,4,3,2,4,4,3,3,2,4,3,2,2,2,3,3,3,2,1]\n",
"\n",
"data = {'E':environmental, 'I':ingredients, 'S':score}\n",
"df = pd.DataFrame(data)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"符号意义:\n",
"\n",
"(~)隔离因变量和自变量(左边因变量,右边自变量)\n",
"<br>(+)分隔各个自变量\n",
"<br>(:)表示两个自变量交互影响"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" df sum_sq mean_sq F PR(>F)\n",
"E 1.0 7.22 7.220000 54.539568 2.896351e-07\n",
"I 1.0 18.00 18.000000 135.971223 1.233581e-10\n",
"E:I 1.0 0.64 0.640000 4.834532 3.924030e-02\n",
"Residual 21.0 2.78 0.132381 NaN NaN\n"
]
}
],
"source": [
"formula = 'S~E+I+E:I' #指定公式\n",
"\n",
"model = ols(formula, df).fit()\n",
"results = anova_lm(model)\n",
"print(results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"P值很小拒绝原假设F值越大。\n",
"\n",
"表示该因素对结果影响越大分别是E和I\n",
"\n",
"E:I行的P值表示交互情况小于0.05,之间并无交互"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

@ -382,13 +382,14 @@
"**计算检验统计量(F)**\n",
"\n",
"计算检验统计量(F)\n",
"<img src=\"assets/20201122190305.png\" width=\"03%\">\n",
"<img src=\"assets/20201122190305.png\" width=\"20%\">\n",
"\n",
"检验列因素的统计量\n",
"<img src=\"assets/20201122190448.png\" width=\"20%\">\n",
"<img src=\"assets/20201122190458.png\" width=\"20%\">\n",
"<img src=\"assets/20201122190458.png\" width=\"50%\">\n",
"\n",
"FA=18.10777>Fα=34903,拒绝原假设H0,说明彩电的品牌对销售量有显著影响\n",
"\n",
"FB=2.100846<Fα=32592,接受原假设H0,说明销售地区对彩电的销售量没有显著影响"
]
},

Loading…
Cancel
Save