{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "## Introduction to Probability and Statistics\r\n",
    "## Assignment\r\n",
    "\r\n",
    "In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "source": [
    "import pandas as pd\r\n",
    "import numpy as np\r\n",
    "\r\n",
    "df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\r\n",
    "df.head()"
   ],
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "   AGE  SEX   BMI     BP   S1     S2    S3   S4      S5  S6    Y\n",
       "0   59    2  32.1  101.0  157   93.2  38.0  4.0  4.8598  87  151\n",
       "1   48    1  21.6   87.0  183  103.2  70.0  3.0  3.8918  69   75\n",
       "2   72    2  30.5   93.0  156   93.6  41.0  4.0  4.6728  85  141\n",
       "3   24    1  25.3   84.0  198  131.4  40.0  5.0  4.8903  89  206\n",
       "4   50    1  23.0  101.0  192  125.4  52.0  4.0  4.2905  80  135"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AGE</th>\n",
       "      <th>SEX</th>\n",
       "      <th>BMI</th>\n",
       "      <th>BP</th>\n",
       "      <th>S1</th>\n",
       "      <th>S2</th>\n",
       "      <th>S3</th>\n",
       "      <th>S4</th>\n",
       "      <th>S5</th>\n",
       "      <th>S6</th>\n",
       "      <th>Y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>59</td>\n",
       "      <td>2</td>\n",
       "      <td>32.1</td>\n",
       "      <td>101.0</td>\n",
       "      <td>157</td>\n",
       "      <td>93.2</td>\n",
       "      <td>38.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.8598</td>\n",
       "      <td>87</td>\n",
       "      <td>151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>48</td>\n",
       "      <td>1</td>\n",
       "      <td>21.6</td>\n",
       "      <td>87.0</td>\n",
       "      <td>183</td>\n",
       "      <td>103.2</td>\n",
       "      <td>70.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.8918</td>\n",
       "      <td>69</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>72</td>\n",
       "      <td>2</td>\n",
       "      <td>30.5</td>\n",
       "      <td>93.0</td>\n",
       "      <td>156</td>\n",
       "      <td>93.6</td>\n",
       "      <td>41.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.6728</td>\n",
       "      <td>85</td>\n",
       "      <td>141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>24</td>\n",
       "      <td>1</td>\n",
       "      <td>25.3</td>\n",
       "      <td>84.0</td>\n",
       "      <td>198</td>\n",
       "      <td>131.4</td>\n",
       "      <td>40.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.8903</td>\n",
       "      <td>89</td>\n",
       "      <td>206</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>50</td>\n",
       "      <td>1</td>\n",
       "      <td>23.0</td>\n",
       "      <td>101.0</td>\n",
       "      <td>192</td>\n",
       "      <td>125.4</td>\n",
       "      <td>52.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.2905</td>\n",
       "      <td>80</td>\n",
       "      <td>135</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "metadata": {},
     "execution_count": 13
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "\r\n",
    "In this dataset, columns as the following:\r\n",
    "* Age and sex are self-explanatory\r\n",
    "* BMI is body mass index\r\n",
    "* BP is average blood pressure\r\n",
    "* S1 through S6 are different blood measurements\r\n",
    "* Y is the qualitative measure of disease progression over one year\r\n",
    "\r\n",
    "Let's study this dataset using methods of probability and statistics.\r\n",
    "\r\n",
    "### Task 1: Compute mean values and variance for all values"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Task 2: Plot boxplots for BMI, BP and Y depending on gender"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Task 3: What is the the distribution of Age, Sex, BMI and Y variables?"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Task 4: Test the correlation between different variables and disease progression (Y)\r\n",
    "\r\n",
    "> **Hint** Correlation matrix would give you the most useful information on which values are dependent."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Task 5: Test the hypothesis that the degree of diabetes progression is different between men and women"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [],
   "metadata": {}
  }
 ],
 "metadata": {
  "orig_nbformat": 4,
  "language_info": {
   "name": "python",
   "version": "3.8.8",
   "mimetype": "text/x-python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "pygments_lexer": "ipython3",
   "nbconvert_exporter": "python",
   "file_extension": ".py"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.8.8 64-bit (conda)"
  },
  "interpreter": {
   "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}