diff --git a/notebook_必备数学基础/方差分析/.ipynb_checkpoints/Python方差分析实例-checkpoint.ipynb b/notebook_必备数学基础/方差分析/.ipynb_checkpoints/Python方差分析实例-checkpoint.ipynb new file mode 100644 index 0000000..2fd6442 --- /dev/null +++ b/notebook_必备数学基础/方差分析/.ipynb_checkpoints/Python方差分析实例-checkpoint.ipynb @@ -0,0 +1,6 @@ +{ + "cells": [], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebook_必备数学基础/方差分析/.ipynb_checkpoints/方差分析-checkpoint.ipynb b/notebook_必备数学基础/方差分析/.ipynb_checkpoints/方差分析-checkpoint.ipynb index c5b0a17..28bddff 100644 --- a/notebook_必备数学基础/方差分析/.ipynb_checkpoints/方差分析-checkpoint.ipynb +++ b/notebook_必备数学基础/方差分析/.ipynb_checkpoints/方差分析-checkpoint.ipynb @@ -250,11 +250,11 @@ "**实例:**\n", "\n", "在评价某药物耐受性及安全性的期临床试验中,对符合纳入标准的30名健康自愿者随机分为3组每组10名,各组注射剂量分别为0.5U、1U、2U,观察48小时部分凝血活酶时间(s)试问不同剂量的部分凝血活酶时间有无不同?\n", - "20201122181401.png\n", + "\n", "\n", "提出假设:H0:μ1=μ2=μ3; H1:μ1,p2,μ3不全相同,显著水平a=0.05\n", "\n", - "20201122181607.png\n", + "\n", "\n", "F0.05(2,26)=2.52, F>F0.05(2,26), P<0.05\n", "拒绝H0。三种不同剂量48小时部分凝血活酶时间不全相同。\n", @@ -268,8 +268,137 @@ "\n", "**LSD方法**\n", "\n", - "对k组中的两组的平均数进行比较,当两组样本容量分别为ni,nj都为时,有\n" + "对k组中的两组的平均数进行比较,当两组样本容量分别为ni,nj都为时,有\n", + "\n", + "\n", + "\n", + "则认为μ1与μ2有显著差异,\n", + "否则认为它们之间没有显著差异\n", + "\n", + "**实例:颜色对销售额的影响**\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "依据上面结果可得出影响效果\n", + "\n" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 多因素方差分析\n", + "\n", + "\n", + "**主效应与交互效应**\n", + "\n", + "\n", + "**双因素方差分析的类型**\n", + "\n", + "\n", + "**无交互效应的双因素方差分析模型**\n", + "\n", + "离差平方和的分解\n", + "\n", + "\n", + "\n", + "\n", + "**有交互效应的双因素方差分析模型**\n", + "\n", + "离差平方和的分解\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "**双因素方差分析的步骤**\n", + "\n", + "**提出假设**\n", + "\n", + "要说明因素A有无显著影响,就是检验如下假设:\n", + "\n", + " Ho:因素A不同水平下观测变量的总体均值无显著差异。\n", + "\n", + " H1:因素A不同水平下观测变量的总体均值存在显著差异。\n", + "\n", + "要说明因素B有无显著影响,就是检验如下假设\n", + " Ho:因素B不同水平下观测变量的总体均值无显著差异\n", + " \n", + " H1:因素B不同水平下观测变量的总体均值存在显著差异。\n", + "\n", + "在有交互效应的双因素方差中,要说明两个因素的交互效应是否显著,还要检验第三组零假设和备择假设\n", + "\n", + " Ho:因素A和因素B的交互效应对观测变量的总体均值无显著差异。\n", + " \n", + " H1:因素A和因素B的交互效应对观测变量的总体均值存在显著差异。\n", + "\n", + "**构造统计量**\n", + "\n", + "在原假设成立的情况下,三个统计量分别服从自由度为(r-1,rs(m-1))、(s-1,rs(m-1))、(r-1)(s-1)rs(m-1)的F分布\n", + "\n", + "\n", + "利用原假设和样本数据分别计算3个F统计量的值和其对应的p值对比p值和α,结合原假设作出推断。若p\n", + "\n", + "提出假设对行因素提出的假设为:\n", + "\n", + " HO: μ1=μ2=...=μi=...=μk(μi为第个水平的均值)H1:μi(i=1,2,…,k)不全相等\n", + "\n", + "对列因素提出的假设为:\n", + "\n", + " HO: H1=μ1=μ2=...=μj=...=μr(mj为第j个水平的均值)H1:μj(j=1,2,...,r)不全相等\n", + " \n", + "**计算各平方和**\n", + "\n", + "\n", + "**计算均方**\n", + "\n", + "误差平方和除以相应的自由度\n", + "\n", + "\n", + "**计算检验统计量(F)**\n", + "\n", + "计算检验统计量(F)\n", + "\n", + "\n", + "检验列因素的统计量\n", + "\n", + "\n", + "\n", + "FA=18.10777>Fα=34903,拒绝原假设H0,说明彩电的品牌对销售量有显著影响\n", + "\n", + "FB=2.100846\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EIS
0555
1545
2534
3523
4512
\n", + "" + ], + "text/plain": [ + " E I S\n", + "0 5 5 5\n", + "1 5 4 5\n", + "2 5 3 4\n", + "3 5 2 3\n", + "4 5 1 2" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 呷哺呷哺2因素:环境等级,食材等级\n", + "from scipy import stats\n", + "import pandas as pd\n", + "import numpy as np\n", + "from statsmodels.formula.api import ols\n", + "from statsmodels.stats.anova import anova_lm\n", + "\n", + "\n", + "environmental = [5,5,5,5,5,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1]\n", + "ingredients = [5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1]\n", + "score = [5,5,4,3,2,5,4,4,3,2,4,4,3,3,2,4,3,2,2,2,3,3,3,2,1]\n", + "\n", + "data = {'E':environmental, 'I':ingredients, 'S':score}\n", + "df = pd.DataFrame(data)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "符号意义:\n", + "\n", + "(~)隔离因变量和自变量(左边因变量,右边自变量)\n", + "
(+)分隔各个自变量\n", + "
(:)表示两个自变量交互影响" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " df sum_sq mean_sq F PR(>F)\n", + "E 1.0 7.22 7.220000 54.539568 2.896351e-07\n", + "I 1.0 18.00 18.000000 135.971223 1.233581e-10\n", + "E:I 1.0 0.64 0.640000 4.834532 3.924030e-02\n", + "Residual 21.0 2.78 0.132381 NaN NaN\n" + ] + } + ], + "source": [ + "formula = 'S~E+I+E:I' #指定公式\n", + "\n", + "model = ols(formula, df).fit()\n", + "results = anova_lm(model)\n", + "print(results)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "P值很小,拒绝原假设,F值越大。\n", + "\n", + "表示该因素对结果影响越大,分别是E和I\n", + "\n", + "E:I行的P值表示交互情况,小于0.05,之间并无交互" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebook_必备数学基础/方差分析/方差分析.ipynb b/notebook_必备数学基础/方差分析/方差分析.ipynb index b47f575..28bddff 100644 --- a/notebook_必备数学基础/方差分析/方差分析.ipynb +++ b/notebook_必备数学基础/方差分析/方差分析.ipynb @@ -382,13 +382,14 @@ "**计算检验统计量(F)**\n", "\n", "计算检验统计量(F)\n", - "\n", + "\n", "\n", "检验列因素的统计量\n", "\n", - "\n", + "\n", "\n", "FA=18.10777>Fα=34903,拒绝原假设H0,说明彩电的品牌对销售量有显著影响\n", + "\n", "FB=2.100846