You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

432 lines
13 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 假设检验的基本思想"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 假设检验\n",
"<ul>\n",
" <li>什么是假设:对总体参数(均值,比例等)的具体数值所作的陈述。比如,我认为新的配方的药效要比原来的更好。\n",
" <li>什么是假设检验:先难总体的参数提出某种假设,然后利用样本的信息判断假设是否成立的过程。比如,上面的假设我是要接受还是拒绝呢\n",
"</ul>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 假设校验的应用\n",
"<ul>\n",
" <li>推广新的教育方案后,教学效果是否有所\n",
" <li>提高醉驾判定为刑事犯罪后是否会使得交通事故减少\n",
" <li>男生和女生在选文理科时是否存在性别因素影响"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 假设校验的基本思想\n",
"<img src=\"assets/20201114091803.png\" width=\"70%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如上图我们假设总体均值是50结果计算得到的是20所以假设μ=50不正确"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 显著性水平\n",
"<ul>\n",
"<li>一个概率值,原假设为真时,拒绝原假设的概率,\n",
"表示为 alpha常用取值为0.01,0.05,0.10。\n",
"<li>一个公司要来招聘了实际有200个人准备混一混\n",
"但是公司希望只有5%的人是浑水摸鱼进来的,\n",
"所以可能会有200*0.05=4个人混进来\n",
"所谓显著性水a就是你允许最多有多大比例混水摸鱼的通过你的测试。\n",
"<li>如上真实为1alpha值为0.05的时候即1-0.05=0.95也就是得有0.95的可能性才能判定为真也就是alpha值越小要求越严格"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 假设检验的步骤\n",
"<ul>\n",
" <li>提出假设\n",
" <li>确定适当的检验统计量。如方差验证、卡方验证等\n",
" <li>规定显著性水平\n",
" <li>计算检验统计量的值\n",
" <li>做出统计决策"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 左右侧检验与双侧检验"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 原假设与备择建设\n",
"<ul>\n",
" <li>待检验的假设又叫原假设也可以叫零假设表示为HO。(零假设其实就是表示原假设一般都是说没有差异,没有改变。。。)\n",
" <li>与原假设对比的假设叫做备择假设表示为H1\n",
" <li>一般在比较的时候,主要有等于,大于,小于"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 检验统计量计算检验的统计量\n",
"<ul>\n",
" <li>计算检验的统计量\n",
" <li>根据给定的显著性水平,查表得出相应的临界值\n",
" <li>将检验统计量的值与显著性水平的临界值进行比较\n",
" <li>得出拒绝或不拒绝原假设的结论"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 检验中常说的小概率\n",
"<ul>\n",
"<li>在一次试验中,一个几乎不可能发生的事件发生的概率\n",
" <li>在一次试验中,小概率事件一旦发生,我们就有理由拒绝原假设\n",
" <li>小概率由我们事先确定,也就是多少的概率我们就拒绝原假设"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### P值\n",
"<ul>\n",
" <li>是一个概率值\n",
" <li>如果原假设为真,P-值是抽样分布中大于或小于样本统计量的概率\n",
" <li>左侧检验时,P-值为曲线上方小于等于检验统计量部分的面积\n",
" <li>右侧检验时,P-值为曲线上方大于等于检验统计量部分的面积"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 左侧检验与右侧检验\n",
"<img src=\"assets/20201114095426.png\" width=\"100%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 什么时候用左侧检验什么时候用右侧检验\n",
"<ul>\n",
"<li>当关键词有不得少于/低于的时候用左侧,比如灯泡的使用寿命不得少于/低于700小时\n",
"<li>当关键词有不得多于/高于的时候用右侧,比如次品率不得多于/高于5%"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"assets/20201114100216.png\" width=\"70%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<ul>\n",
" <li>单侧检验指按分布的一侧计算显著性水平概率的检验。用于检验大于、小于、高于、低于、优于、劣于等有确定性大小关系的假设检验问题。这类问题的确定是有一定的理论依据的。假设检验写作:μ1<μ2或μ1>μ2。\n",
" <li>双侧检验指按分布两端计算显著性水平概率的检验应用于理论上不能确定两个总体一个一定比另一个大或小的假设检验。般假设检验写作H1: μ1≠μ2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"一例如,某种零件的尺寸要求其平均长度为10cm大于或小于10cm均属于不合格我们想要证明(检验)大于或小于这两种可能性中的任何一种是否成立建立的原假设与备择假设应为:\n",
"<br>\n",
"H0:μ = 10; H1: μ≠10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 检验结果\n",
"单侧检验\n",
"<ul>\n",
" <li>若p值 > α不拒绝H0\n",
" <li>若p值 < α拒绝H0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"双侧检验\n",
"<ul>\n",
" <li>若p值 > α/2不拒绝H0\n",
" <li>若p值 < α/2拒绝H0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Z检验基本原理"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 总体均值检验\n",
"什么时候用Z检验什么时候用T检验\n",
"<img src=\"assets/20201114101438.png\" width=\"70%\">\n",
"一般是用T检验"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 统计量Z值的计算公式为\n",
"如果检验一个样本平均数与一个已知的总体平均数的差异是否显著其Z值计算公式为\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"z = \\frac{\\overline{X} - μ}{σ_\\overline{x}} = \\frac{\\overline{X} - μ}{σ/\\sqrt{n}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果检验来自两个的两组样本平均数的差异性从而判断它们各自代表的总体的差异是否显著其Z值计算公式为"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
"z = \n",
"\\frac{\\overline{X}_1 - \\overline{X}_2}\n",
"{S_{\\overline{X}_1-\\overline{X}_2}}\n",
"= \n",
"\\frac{\\overline{X}_1 - \\overline{X}_2}\n",
"{\\sqrt{S^2_1 / n_1 + S^2_2 / n_2}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Z检验原理\n",
"<ul>\n",
" <li>当总体标准差已知,样本量较大时用标准正态分布的理论来推断差异发生的概率,从而比较平均数的差异是否显著\n",
" <li>标准正态变换后Z的界值"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"双侧:\n",
"$$\n",
"z_{0.05/2} = 1.96,z_{0.01/2} = 2.58\n",
"$$\n",
"单侧:\n",
"$$\n",
"Z_{0.05} = 1.645,Z_{0.01} = 2.33\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Z检验实例\n",
"### Z检验实例1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"研究正常人与高血压患者胆固醇含量mg%)的资料如下,试比较两组血清胆固醇含量有无差别。\n",
"<br>\n",
"正常人组 \n",
"$$\n",
"n_1 = 506,\\overline{X}_1 = 180.6,S_1 = 34.2\n",
"$$\n",
"样本数506均值1800.6标准差34.2\n",
"<br>\n",
"<br>\n",
"高血压组\n",
"$$\n",
"n_2 = 142,\\overline{X}_2 = 223.6,S_2 = 45.8\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"建立检验假设,确定检验水平\n",
"<ul>\n",
"<li>H0: μ1 = μ2认为没有差别\n",
"<li>H1: μ1 ≠μ2认为有区别\n",
"<li>α = 0.05 有5%的置信空间,即误差在这个范围内是允许的\n",
"</li>\n",
"</ul>\n",
"计算统计量Z\n",
"<ul>\n",
" <li>将已知数据代入公式</li>\n",
"$$\n",
"Z = \\frac{|180.6-223.6|}{\\sqrt{34.2^2/506+45.8^2/142}} = 10.40\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这里是双侧即二分之一α1-α/2=0.975,查表\n",
"<br>网上搜索:统计分布临界值\n",
"<img src=\"assets/20201115111943.png\" width=\"70%\">\n",
"<br><br>\n",
"1.9+0.6=1.96统计量为10.4比1.96大意味着面积肯定小于1.96临界值(α/2的面积\n",
"<img src=\"assets/20201115112520.png\" width=\"70%\">\n",
"根据双侧检验若p值 < α/2拒绝H0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"确认P值作为判断结论本例Z=10.40>1.96查表得0.975对应值故P<0.05,按α=0.05水准拒绝H0接受H1可以认为正常人与高血压患者的血清胆固醇含量有差别高血压患者高于正常人。\n",
"<br>\n",
"**注意:我们的第一反应可能是不应该越小表示差异也越小吗,其实是越小于α/2表示两者值的越偏离因为我们已经假定了对比值A在中间H0值那么对比AH0的BH1应该越趋向中间越高才表示相似**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Z检验实例2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"某机床厂加工一种零件根据经验知道该厂加工零件的椭圆度近似服从正态分布其总体均值为p=0.081mm,总体标准差为=0.025。今换一种新机床进行加抽取n=200个零件进行检验得到的椭圆度为0.076mm。试问新机床加工零件的椭圆度的均值与以前有无显著差异?(a=0.05)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"H0: μ = 0.081H1μ ≠ 0.081 α = 0.05 n = 200\n",
"<br>**检验统计量**\n",
"$$\n",
"z = \\frac{\\overline{x}-μ_0}{σ/\\sqrt{n}} \n",
"= \n",
"\\frac{0.076-0.081}{0.025/\\sqrt{200}} = -2.83\n",
"$$\n",
"<br>\n",
"决策:\n",
"<br>-2.83在-1.96左侧也就是p值的面积小于α/2α = 0.05的水平上拒绝H0\n",
"<br>结论:\n",
"<br>有证据表明新机床加工的零件的椭圆度与以前有显著差异\n",
"<img src=\"assets/20201115151902.png\" width=\"50%\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Z检验实例3\n",
"根据过去大量资料某厂生产的灯泡的使用寿命服从正态分布N~(1020,100^2)。现从最近生产的一批产品中随机抽取16只,测得样本平均寿命为1080小时。试在005的显著性水平下判断这批产品的使用寿命是否有显著提高?(a=0.05)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"H0: μ ≤ 1020H1μ 1020 α = 0.05 n = 16\n",
"<br>**检验统计量(单侧)**\n",
"$$\n",
"z = \\frac{\\overline{x}-μ_0}{σ/\\sqrt{n}} \n",
"= \n",
"\\frac{1080-1020}{100/\\sqrt{14}} = 2.4\n",
"$$\n",
"1-0.05=0.95其临界值没有相近的是0.9505和0.9495那么把它们两相加除以2作为0.95的临界值,(1.6+1.6)/2+(0.04+0.05)/2=1.645\n",
"<br>\n",
"决策:\n",
"<br>2.4在1.645右侧也就是p值的面积小于αα = 0.05的水平上拒绝H0\n",
"<br>结论:\n",
"<br>有证据表明新生产的灯泡的使用寿命有显著提高\n",
"<img src=\"assets/20201115152913.png\" width=\"50%\">"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}