{ "cells": [ { "cell_type": "markdown", "source": [ "## 概率与统计简介\n", "## 作业\n", "\n", "在本次作业中,我们将使用[此处](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)提供的糖尿病患者数据集。\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 13, "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\n", "df.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n", "0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151\n", "1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75\n", "2 72 2 30.5 93.0 156 93.6 41.0 4.0 4.6728 85 141\n", "3 24 1 25.3 84.0 198 131.4 40.0 5.0 4.8903 89 206\n", "4 50 1 23.0 101.0 192 125.4 52.0 4.0 4.2905 80 135" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AGESEXBMIBPS1S2S3S4S5S6Y
059232.1101.015793.238.04.04.859887151
148121.687.0183103.270.03.03.89186975
272230.593.015693.641.04.04.672885141
324125.384.0198131.440.05.04.890389206
450123.0101.0192125.452.04.04.290580135
\n", "
" ] }, "metadata": {}, "execution_count": 13 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "在这个数据集中,列的含义如下:\n", "* 年龄和性别不言自明\n", "* BMI是身体质量指数\n", "* BP是平均血压\n", "* S1到S6是不同的血液测量值\n", "* Y是疾病在一年内进展的定性指标\n", "\n", "让我们使用概率和统计方法来研究这个数据集。\n", "\n", "### 任务 1:计算所有值的均值和方差\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任务2:根据性别绘制BMI、BP和Y的箱线图\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任务3:年龄、性别、BMI 和 Y 变量的分布是什么?\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任务 4:测试不同变量与疾病进展(Y)之间的相关性\n", "\n", "> **提示** 相关性矩阵可以为您提供最有用的信息,帮助判断哪些值是相关的。\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任务5:检验糖尿病进展程度在男性和女性之间是否存在差异\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。\n" ] } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python", "version": "3.8.8", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python3", "display_name": "Python 3.8.8 64-bit (conda)" }, "interpreter": { "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" }, "coopTranslator": { "original_hash": "6d945fd15163f60cb473dbfe04b2d100", "translation_date": "2025-09-06T17:10:07+00:00", "source_file": "1-Introduction/04-stats-and-probability/assignment.ipynb", "language_code": "zh" } }, "nbformat": 4, "nbformat_minor": 2 }