{ "cells": [ { "cell_type": "markdown", "source": [ "## 概率與統計學簡介\n", "## 作業\n", "\n", "在這次作業中,我們將使用[這裡](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)提供的糖尿病患者數據集。\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 13, "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\n", "df.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n", "0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151\n", "1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75\n", "2 72 2 30.5 93.0 156 93.6 41.0 4.0 4.6728 85 141\n", "3 24 1 25.3 84.0 198 131.4 40.0 5.0 4.8903 89 206\n", "4 50 1 23.0 101.0 192 125.4 52.0 4.0 4.2905 80 135" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AGESEXBMIBPS1S2S3S4S5S6Y
059232.1101.015793.238.04.04.859887151
148121.687.0183103.270.03.03.89186975
272230.593.015693.641.04.04.672885141
324125.384.0198131.440.05.04.890389206
450123.0101.0192125.452.04.04.290580135
\n", "
" ] }, "metadata": {}, "execution_count": 13 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "在此數據集中,列包含以下內容:\n", "* 年齡和性別不需額外解釋\n", "* BMI 是身體質量指數\n", "* BP 是平均血壓\n", "* S1 至 S6 是不同的血液測量值\n", "* Y 是疾病在一年內進展的定性指標\n", "\n", "讓我們使用概率和統計方法來研究這個數據集。\n", "\n", "### 任務 1:計算所有值的平均值和方差\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任務 2:根據性別繪製 BMI、BP 和 Y 的箱型圖\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任務 3:年齡、性別、BMI 和 Y 變數的分佈是什麼?\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任務 4:測試不同變數與疾病進展(Y)之間的相關性\n", "\n", "> **提示** 相關性矩陣可以為你提供最有用的資訊,幫助判斷哪些值是相關的。\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### 任務 5:檢驗糖尿病進展程度在男性和女性之間是否存在差異的假設\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n---\n\n**免責聲明**: \n此文件已使用人工智能翻譯服務 [Co-op Translator](https://github.com/Azure/co-op-translator) 進行翻譯。我們致力於提供準確的翻譯,但請注意,自動翻譯可能包含錯誤或不準確之處。應以原始語言的文件作為權威來源。對於關鍵資訊,建議使用專業的人類翻譯。我們對因使用此翻譯而引起的任何誤解或誤釋不承擔責任。\n" ] } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python", "version": "3.8.8", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python3", "display_name": "Python 3.8.8 64-bit (conda)" }, "interpreter": { "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" }, "coopTranslator": { "original_hash": "6d945fd15163f60cb473dbfe04b2d100", "translation_date": "2025-09-06T17:12:43+00:00", "source_file": "1-Introduction/04-stats-and-probability/assignment.ipynb", "language_code": "hk" } }, "nbformat": 4, "nbformat_minor": 2 }