{ "cells": [ { "cell_type": "markdown", "source": [ "## Introduction to Probability and Statistics\r\n", "## Assignment\r\n", "\r\n", "In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)." ], "metadata": {} }, { "cell_type": "code", "execution_count": 13, "source": [ "import pandas as pd\r\n", "import numpy as np\r\n", "\r\n", "df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\r\n", "df.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n", "0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151\n", "1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75\n", "2 72 2 30.5 93.0 156 93.6 41.0 4.0 4.6728 85 141\n", "3 24 1 25.3 84.0 198 131.4 40.0 5.0 4.8903 89 206\n", "4 50 1 23.0 101.0 192 125.4 52.0 4.0 4.2905 80 135" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AGESEXBMIBPS1S2S3S4S5S6Y
059232.1101.015793.238.04.04.859887151
148121.687.0183103.270.03.03.89186975
272230.593.015693.641.04.04.672885141
324125.384.0198131.440.05.04.890389206
450123.0101.0192125.452.04.04.290580135
\n", "
" ] }, "metadata": {}, "execution_count": 13 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "\r\n", "In this dataset, columns as the following:\r\n", "* Age and sex are self-explanatory\r\n", "* BMI is body mass index\r\n", "* BP is average blood pressure\r\n", "* S1 through S6 are different blood measurements\r\n", "* Y is the qualitative measure of disease progression over one year\r\n", "\r\n", "Let's study this dataset using methods of probability and statistics.\r\n", "\r\n", "### Task 1: Compute mean values and variance for all values" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Task 2: Plot boxplots for BMI, BP and Y depending on gender" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Task 3: What is the the distribution of Age, Sex, BMI and Y variables?" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Task 4: Test the correlation between different variables and disease progression (Y)\r\n", "\r\n", "> **Hint** Correlation matrix would give you the most useful information on which values are dependent." ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Task 5: Test the hypothesis that the degree of diabetes progression is different between men and women" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python", "version": "3.8.8", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python3", "display_name": "Python 3.8.8 64-bit (conda)" }, "interpreter": { "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" } }, "nbformat": 4, "nbformat_minor": 2 }