You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/1-Introduction/04-stats-and-probability/notebook.ipynb

1123 lines
155 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Probability and Statistics\n",
"In this notebook, we will play around with some of the concepts we have previously discussed. Many concepts from probability and statistics are well-represented in major libraries for data processing in Python, such as `numpy` and `pandas`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import random\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Random Variables and Distributions\n",
"Let's start with drawing a sample of 30 values from a uniform distribution from 0 to 9. We will also compute mean and variance."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sample: [4, 8, 5, 10, 5, 1, 1, 1, 7, 9, 7, 0, 2, 7, 3, 5, 9, 8, 3, 10, 2, 9, 2, 9, 9, 8, 1, 8, 7, 3]\n",
"Mean = 5.433333333333334\n",
"Variance = 10.178888888888887\n"
]
}
],
"source": [
"sample = [ random.randint(0,10) for _ in range(30) ]\n",
"print(f\"Sample: {sample}\")\n",
"print(f\"Mean = {np.mean(sample)}\")\n",
"print(f\"Variance = {np.var(sample)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To visually estimate how many different values are there in the sample, we can plot the **histogram**:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAL4UlEQVR4nO3db4xlBXnH8e/PXYiCGNpyayzLdDQ1tMZEIROqJSEt2AaKAV+0CSQaa0zmjbXQmJi1b5q+o0lj9IUx2SBKIsVYhNRASzUqMSbttrtAW2AhtXQrq+gOMRawSSn26Yu5C+ty1znL3nPvw8z3k0zm/jmc+xxm9svZc8/hpqqQJPX1qmUPIEn62Qy1JDVnqCWpOUMtSc0ZaklqbvcYKz3vvPNqdXV1jFVL0rZ08ODBp6pqMuu5UUK9urrKgQMHxli1JG1LSf7zZM956EOSmjPUktScoZak5gy1JDVnqCWpOUMtSc1tGeokFyZ58Livp5PcuIDZJEkMOI+6qh4D3g6QZBfwXeCucceSJB1zqoc+rgD+vapOemK2JGm+TvXKxOuA22c9kWQdWAdYWVk5zbEk6eVb3XvPUl738E1Xj7LewXvUSc4ErgH+atbzVbWvqtaqam0ymXm5uiTpZTiVQx9XAfdX1Q/GGkaS9FKnEurrOclhD0nSeAaFOslZwG8Dd447jiTpRIPeTKyq/wZ+YeRZJEkzeGWiJDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJam7op5Cfm+SOJI8mOZTknWMPJknaNOhTyIFPAvdW1e8lORM4a8SZJEnH2TLUSV4HXAb8AUBVPQc8N+5YkqRjhhz6eBOwAXw2yQNJbk5y9okLJVlPciDJgY2NjbkPKkk71ZBQ7wYuBj5dVRcBPwb2nrhQVe2rqrWqWptMJnMeU5J2riGhPgIcqar90/t3sBluSdICbBnqqvo+8ESSC6cPXQE8MupUkqQXDD3r48PAbdMzPh4HPjDeSJKk4w0KdVU9CKyNO4okaRavTJSk5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJam7Qp5AnOQw8A/wEeL6q/ERySVqQQaGe+q2qemq0SSRJM3noQ5KaGxrqAr6S5GCS9VkLJFlPciDJgY2NjflNKEk73NBQX1pVFwNXAR9KctmJC1TVvqpaq6q1yWQy1yElaScbFOqq+t70+1HgLuCSMYeSJL1oy1AnOTvJOcduA78DPDT2YJKkTUPO+ng9cFeSY8v/ZVXdO+pUkqQXbBnqqnoceNsCZpEkzeDpeZLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJam5waFOsivJA0nuHnMgSdJPO5U96huAQ2MNIkmabVCok+wBrgZuHnccSdKJdg9c7hPAR4FzTrZAknVgHWBlZeW0B1u01b33LO21D9909dJeW9vfMn+3NR9b7lEneTdwtKoO/qzlqmpfVa1V1dpkMpnbgJK00w059HEpcE2Sw8AXgMuTfH7UqSRJL9gy1FX1saraU1WrwHXA16vqvaNPJkkCPI9aktob+mYiAFV1H3DfKJNIkmZyj1qSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqbktQ53k1Un+Mck/J3k4yZ8tYjBJ0qbdA5b5H+Dyqno2yRnAt5L8bVX9w8izSZIYEOqqKuDZ6d0zpl815lCSpBcN2aMmyS7gIPArwKeqav+MZdaBdYCVlZV5zrjtre69Z9kjLNzhm65eyusu69/1srZX28OgNxOr6idV9XZgD3BJkrfOWGZfVa1V1dpkMpnzmJK0c53SWR9V9SPgPuDKMYaRJL3UkLM+JknOnd5+DfAu4NGR55IkTQ05Rv0G4NbpcepXAV+sqrvHHUuSdMyQsz7+BbhoAbNIkmbwykRJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc4Zakpoz1JLU3JahTnJBkm8kOZTk4SQ3LGIwSdKmLT+FHHge+EhV3Z/kHOBgkq9W1SMjzyZJYsAedVU9WVX3T28/AxwCzh97MEnSplM6Rp1kFbgI2D/KNJKklxgc6iSvBb4E3FhVT894fj3JgSQHNjY25jmjJO1og0Kd5Aw2I31bVd05a5mq2ldVa1W1NplM5jmjJO1oQ876CPAZ4FBVfXz8kSRJxxuyR30p8D7g8iQPTr9+d+S5JElTW56eV1XfArKAWSRJM3hloiQ1Z6glqTlDLUnNGWpJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNWeoJak5Qy1JzRlqSWrOUEtSc1uGOsktSY4meWgRA0mSftqQPerPAVeOPIck6SS2DHVVfRP44QJmkSTNsHteK0qyDqwDrKysvOz1rO69Z14jqTF/ztJwc3szsar2VdVaVa1NJpN5rVaSdjzP+pCk5gy1JDU35PS824G/By5MciTJB8cfS5J0zJZvJlbV9YsYRJI0m4c+JKk5Qy1JzRlqSWrOUEtSc4Zakpoz1JLUnKGWpOYMtSQ1Z6glqTlDLUnNGWpJas5QS1JzhlqSmjPUktScoZak5gy1JDVnqCWpOUMtSc0ZaklqzlBLUnOGWpKaGxTqJFcmeSzJt5PsHXsoSdKLtgx1kl3Ap4CrgLcA1yd5y9iDSZI2DdmjvgT4dlU9XlXPAV8Arh13LEnSMbsHLHM+8MRx948Av37iQknWgfXp3WeTPPYyZzoPeOpl/rOvVG7zNpc/31nbO7Xjtvk0f86/fLInhoQ6Mx6rlzxQtQ/YdwpDzX6x5EBVrZ3uel5J3Obtb6dtL7jN8zTk0McR4ILj7u8BvjfvQSRJsw0J9T8Bb07yxiRnAtcBXx53LEnSMVse+qiq55P8IfB3wC7glqp6eMSZTvvwySuQ27z97bTtBbd5blL1ksPNkqRGvDJRkpoz1JLUXJtQ77TL1JNckOQbSQ4leTjJDcueaVGS7EryQJK7lz3LIiQ5N8kdSR6d/rzfueyZxpbkj6e/1w8luT3Jq5c907wluSXJ0SQPHffYzyf5apJ/m37/uXm8VotQ79DL1J8HPlJVvwa8A/jQDtjmY24ADi17iAX6JHBvVf0q8Da2+bYnOR/4I2Ctqt7K5kkI1y13qlF8DrjyhMf2Al+rqjcDX5veP20tQs0OvEy9qp6sqvunt59h8w/v+cudanxJ9gBXAzcve5ZFSPI64DLgMwBV9VxV/WipQy3GbuA1SXYDZ7ENr72oqm8CPzzh4WuBW6e3bwXeM4/X6hLqWZepb/toHZNkFbgI2L/kURbhE8BHgf9b8hyL8iZgA/js9HDPzUnOXvZQY6qq7wJ/AXwHeBL4r6r6ynKnWpjXV9WTsLkzBvziPFbaJdSDLlPfjpK8FvgScGNVPb3secaU5N3A0ao6uOxZFmg3cDHw6aq6CPgxc/rrcFfT47LXAm8Efgk4O8l7lzvVK1uXUO/Iy9STnMFmpG+rqjuXPc8CXApck+Qwm4e3Lk/y+eWONLojwJGqOva3pTvYDPd29i7gP6pqo6r+F7gT+I0lz7QoP0jyBoDp96PzWGmXUO+4y9SThM3jloeq6uPLnmcRqupjVbWnqlbZ/Bl/vaq29Z5WVX0feCLJhdOHrgAeWeJIi/Ad4B1Jzpr+nl/BNn8D9ThfBt4/vf1+4K/nsdIh//e80S3hMvUOLgXeB/xrkgenj/1JVf3N8kbSSD4M3DbdCXkc+MCS5xlVVe1PcgdwP5tnNz3ANrycPMntwG8C5yU5AvwpcBPwxSQfZPM/WL8/l9fyEnJJ6q3LoQ9J0kkYaklqzlBLUnOGWpKaM9SS1JyhlqTmDLUkNff/C2KbzOLSKWIAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.hist(sample)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyzing Real Data\n",
"\n",
"Mean and variance are very important when analyzing real-world data. Let's load the data about baseball players from [SOCR MLB Height/Weight Data](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Team</th>\n",
" <th>Role</th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adam_Donachie</td>\n",
" <td>BAL</td>\n",
" <td>Catcher</td>\n",
" <td>74</td>\n",
" <td>180.0</td>\n",
" <td>22.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Paul_Bako</td>\n",
" <td>BAL</td>\n",
" <td>Catcher</td>\n",
" <td>74</td>\n",
" <td>215.0</td>\n",
" <td>34.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Ramon_Hernandez</td>\n",
" <td>BAL</td>\n",
" <td>Catcher</td>\n",
" <td>72</td>\n",
" <td>210.0</td>\n",
" <td>30.78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Kevin_Millar</td>\n",
" <td>BAL</td>\n",
" <td>First_Baseman</td>\n",
" <td>72</td>\n",
" <td>210.0</td>\n",
" <td>35.43</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Chris_Gomez</td>\n",
" <td>BAL</td>\n",
" <td>First_Baseman</td>\n",
" <td>73</td>\n",
" <td>188.0</td>\n",
" <td>35.71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1029</th>\n",
" <td>Brad_Thompson</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>73</td>\n",
" <td>190.0</td>\n",
" <td>25.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1030</th>\n",
" <td>Tyler_Johnson</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>74</td>\n",
" <td>180.0</td>\n",
" <td>25.73</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1031</th>\n",
" <td>Chris_Narveson</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>75</td>\n",
" <td>205.0</td>\n",
" <td>25.19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1032</th>\n",
" <td>Randy_Keisler</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>75</td>\n",
" <td>190.0</td>\n",
" <td>31.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1033</th>\n",
" <td>Josh_Kinney</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>73</td>\n",
" <td>195.0</td>\n",
" <td>27.92</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1034 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Name Team Role Height Weight Age\n",
"0 Adam_Donachie BAL Catcher 74 180.0 22.99\n",
"1 Paul_Bako BAL Catcher 74 215.0 34.69\n",
"2 Ramon_Hernandez BAL Catcher 72 210.0 30.78\n",
"3 Kevin_Millar BAL First_Baseman 72 210.0 35.43\n",
"4 Chris_Gomez BAL First_Baseman 73 188.0 35.71\n",
"... ... ... ... ... ... ...\n",
"1029 Brad_Thompson STL Relief_Pitcher 73 190.0 25.08\n",
"1030 Tyler_Johnson STL Relief_Pitcher 74 180.0 25.73\n",
"1031 Chris_Narveson STL Relief_Pitcher 75 205.0 25.19\n",
"1032 Randy_Keisler STL Relief_Pitcher 75 190.0 31.01\n",
"1033 Josh_Kinney STL Relief_Pitcher 73 195.0 27.92\n",
"\n",
"[1034 rows x 6 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\"../../data/SOCR_MLB.tsv\",sep='\\t', header=None, names=['Name','Team','Role','Height','Weight','Age'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> We are using a package called [**Pandas**](https://pandas.pydata.org/) here for data analysis. We will talk more about Pandas and working with data in Python later in this course.\n",
"\n",
"Let's compute average values for age, height and weight:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Age 28.736712\n",
"Height 73.697292\n",
"Weight 201.689255\n",
"dtype: float64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Age','Height','Weight']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's focus on height, and compute standard deviation and variance: "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 72, 73, 75, 78]\n"
]
}
],
"source": [
"print(list(df['Height'])[:20])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean = 73.6972920696325\n",
"Variance = 5.316798081118074\n",
"Standard Deviation = 2.3058183105175645\n"
]
}
],
"source": [
"mean = df['Height'].mean()\n",
"var = df['Height'].var()\n",
"std = df['Height'].std()\n",
"print(f\"Mean = {mean}\\nVariance = {var}\\nStandard Deviation = {std}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition to mean, it makes sense to look at the median value and quartiles. They can be visualized using a **box plot**:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAACICAYAAAD6bB0zAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAATqUlEQVR4nO3dbWxW533H8d8/CYaV5cEJzcJmmNehhhSiZCXZMmcP1bIX3Rale9Fpi7aqzTImtslSK3Whq6U+vCjq1iXVxIuhpe0aVZOlNDIMWauVRSaIBZXxUCfQASpsEKCMAGEucopN5WsvfENunNsP55f4XOfE3490y8kdsP7+5hyfy5fvh0gpCQAAAMCE63IPAAAAAFQJC2QAAACgCQtkAAAAoAkLZAAAAKAJC2QAAACgyQ1z8UmXLFmSOjs75+JTAwAAAO+IvXv3nkspvXfy/XOyQO7s7NSePXvm4lPX2vnz53XbbbflHqNWaOahm4duHrp56Oahm4durUXE8Vb38xCLEu3fvz/3CLVDMw/dPHTz0M1DNw/dPHQrJubijULuu+++xA7yW42NjamtrS33GLVCMw/dPHTz0M1DNw/dPHRrLSL2ppTum3w/O8glev7553OPUDs089DNQzcP3Tx089DNQ7di2EEGAADAvMQOcgX09fXlHqF2aOahm4duHrp56Oahm4duxbCDDAAAgHmJHeQK4Ke34mjmoZuHbh66eejmoZuHbsWwgwwAAIB5iR3kChgYGMg9Qu3QzEM3D908dPPQzUM3D92KYQe5RCMjI1q8eHHuMWqFZh66eejmoZuHbh66eejWGjvIFTA0NJR7hNqhmYduHrp56Oahm4duHroVwwK5RCtWrMg9Qu3QzEM3D908dPPQzUM3D92KYYFcotOnT+ceoXZo5qGbh24eunno5qGbh27FsEAu0Y033ph7hNqhmYduHrp56Oahm4duHroVwwIZAAAAaMICuUQXL17MPULt0MxDNw/dPHTz0M1DNw/dimGBXKKlS5fmHqF2aOahm4duHrp56Oahm4duxbBALtGRI0dyj1A7NPPQzUM3D908dPPQzUO3YnijkBLxIt3F0cxDNw/dPHTz0M1DNw/dWuONQipgx44duUeoHZp56Oahm4duHrp56OahWzHsIAMAAGBeYge5Avr6+nKPUDs089DNQzcP3Tx089DNQ7di2EEGAADAvMQOcgXw01txNPPQzUM3D908dPPQzUO3YthBBgAAwLzEDnIF9Pf35x6hdmjmoZuHbh66eejmoZuHbsWwg1yisbExtbW15R6jVmjmoZuHbh66eejmoZuHbq2xg1wBO3fuzD1C7dDMQzcP3Tx089DNQzcP3YphgVyiu+++O/cItUMzD908dPPQzUM3D908dCuGBXKJjh07lnuE2qGZh24eunno5qGbh24euhXDArlES5YsyT1C7dDMQzcP3Tx089DNQzcP3YphgVyiS5cu5R6hdmjmoZuHbh66eejmoZuHbsWwQC7R5cuXc49QOzTz0M1DNw/dPHTz0M1Dt2JYIJeovb099wi1QzMP3Tx089DNQzcP3Tx0K4YFcolOnjyZe4TaoZmHbh66eejmoZuHbh66FcMCuUQrV67MPULt0MxDNw/dPHTz0M1DNw/dimGBXKLdu3fnHqF2aOahm4duHrp56Oahm4duxfBW0yUaHx/XddfxM0kRNPPQzUM3D908dPPQzUO31nir6QrYunVr7hFqh2Yeunno5qGbh24eunnoVgw7yAAAAJiX2EGugM2bN+ceoXZo5qGbh24eunno5qGbh27FsIMMAACAeYkd5ArYsmVL7hFqh2Yeunno5qGbh24eunnoVgw7yCXiGaTF0cxz66236sKFC7nHqJ30+ZsUX/xR7jFaam9v1+uvv557jJY4Tz1089DNQ7fW2EGugMHBwdwj1A7NPBcuXFBKiVvBm6TsM0x1q/IPPJynHrp56OahWzEskEt0//335x6hdmgGVB/nqYduHrp56FYMC+QSHTp0KPcItUMzoPo4Tz1089DNQ7diWCCX6IEHHsg9Qu10dHTkHgHADDhPPVXuFhG5R5hSlbtVGd2KmXGBHBHfiIjXIuJAGQO5uru7tWjRIkWEFi1apO7u7twj4R1Q5cddotrOvnFWnxj4hM79+FzuUd71OE89dCtm+fLligh1dHQoIrR8+fLcI11V5TXIldk6OjoqNVtvb69Wr16t66+/XqtXr1Zvb2/uka4xmx3kb0r68BzP8bZ0d3dr06ZN2rBhg0ZGRrRhwwZt2rSpMgcBfAsWLMg9Ampq0yubtO/MPm16eVPuUd71OE89dJu95cuX68SJE+rq6tL27dvV1dWlEydOVGKRXOU1SPNs+/btq8xsvb296unp0caNG3Xp0iVt3LhRPT091Vokz/KZ3Z2SDsz22dZr1qxJZVq4cGF68sknr7nvySefTAsXLix1jplM5EYRx44dyz1CLc33Y+21kdfSmm+tSau/uTqt+daadPaNs7P7i5+/aW4Hexuq/P+U89RT5W5VO94kpa6urpTSm926uroqMWeV1yDNs13pVoXZVq1alQYHB6+5b3BwMK1atar0WSTtSS3Wsu/YY5Aj4s8jYk9E7Dl16pSOHz+uw4cP68CBAzp16pR27dql4eFhvfDCCxofH7/6gtVX3vpwy5YtGh8f1wsvvKDh4WHt2rVLp06d0oEDB3T48GEdP35ce/fu1fnz5/Xiiy9qbGxM/f39kqTR0VGtW7dOfX19kqSBgQF97GMf0+joqM6cOaOhoSEdPXpUR48e1dDQkM6cOaOXXnpJIyMjGhgYkKSrf/fKx/7+fo2NjenFF1/U+fPntXfv3rf9NTU6cStw6+zszD5DHW+S7PNp8rkwMDCgkZERvfTSS5U6n6b7mj73nc9pPI1PdEjjemLzE7P6miRV9muq8vcPztN3XzdJlfoeIUmf/exnNTw8rB07dmh8fFyPPfbYO7aOeDtf0+Q1SF9fn9atW6fR0dFSv0e0+ppGR0d1xx13SJK2b9+ukZER3XPPPRodHc36vfzgwYMaHR295mu65ZZbdPDgwdKvT1NqtWqefBM7yO8IVeAn3bo5d+5c7hFqaT4fa827x1dus95FZgfZwnnqqXK3qh1vatpBvtKNHeSZNc92pVsVZptXO8g5rV27VuvXr9dTTz2lN954Q0899ZTWr1+vtWvX5h4Nb9P+/ftzj4Ca2fTKpqu7x1eMp3EeizyHOE89dJu9ZcuWaefOnXrwwQe1bds2Pfjgg9q5c6eWLVuWe7RKr0GaZ9u9e3dlZuvp6dHjjz+ubdu26fLly9q2bZsef/xx9fT0ZJ2r2azeajoiOiX1p5RWz+aT5nir6e7ubj399NMaHR3VwoULtXbtWm3cuLHUGWYSEZpNb7xpbGxMbW1tuceonfl8rH1060d1+MLht9x/Z/udeu6R56b/y1+4WfrC8BxN9vZU+f8p56mnyt2qeLxdeaLeFcuWLdOrr76acaI3VXkNUtXZent79aUvfUkHDx7UXXfdpZ6eHj366KOlzxFTvNX0jAvkiOiV9CFJSySdkfT5lNLXp/s7ORbIdVDFbzhV19/fr4cffjj3GLXDsWZigWzhPPVUuRvH27sP3VqzF8gOFshAXlW+uFUaC2QAmFemWiC/Kx6DXBczPmMSb0EzoPo4Tz1089DNQ7di2EEG3oXYbTSxgwwA8wo7yBXAT2/F0cyX+3VU63ircrf29vbMR9TUOE89dPPQzUO3YthBBgAAwLzEDnIFXHkXF8wezTx089DNQzcP3Tx089CtGHaQSzQyMqLFixfnHqNWaOahm4duHrp56Oahm4durbGDXAFDQ0O5R6gdmnno5qGbh24eunno5qFbMSyQS7RixYrcI9QOzTx089DNQzcP3Tx089CtGBbIJTp9+nTuEWqHZh66eejmoZuHbh66eehWDAvkEt144425R6gdmnno5qGbh24eunno5qFbMSyQAQAAgCYskEt08eLF3CPUDs08dPPQzUM3D908dPPQrRgWyCVaunRp7hFqh2Yeunno5qGbh24eunnoVgwL5BIdOXIk9wi1QzMP3Tx089DNQzcP3Tx0K4Y3CikRL9JdHM08dPPQzUM3D908dPPQrTXeKKQCduzYkXuE2qGZh24eunno5qGbh24euhXDDjIAAADmJXaQK6Cvry/3CLVDMw/dPHTz0M1DNw/dPHQrhh1kAAAAzEvsIFcAP70VRzMP3Tx089DNQzcP3Tx0K4YdZAAAAMxL7CBXQH9/f+4RaodmHrp56Oahm4duHrp56FYMO8glGhsbU1tbW+4xaoVmHrp56Oahm4duHrp56NYaO8gVsHPnztwj1A7NPHTz0M1DNw/dPHTz0K0YFsgluvvuu3OPUDs089DNQzcP3Tx089DNQ7diWCCX6NixY7lHqB2aeejmoZuHbh66eejmoVsxLJBLtGTJktwj1A7NPHTz0M1DNw/dPHTz0K0YFsglunTpUu4RaodmHrp56Oahm4duHrp56FYMC+QSXb58OfcItUMzD908dPPQzUM3D908dCuGBXKJ2tvbc49QOzTz0M1DNw/dPHTz0M1Dt2JYIJfo5MmTuUeoHZp56Oahm4duHrp56OahWzEskEu0cuXK3CPUDs08dPPQzUM3D908dPPQrRgWyCXavXt37hFqh2Yeunno5qGbh24eunnoVgxvNV2i8fFxXXcdP5MUQTMP3Tx089DNQzcP3Tx0a423mq6ArVu35h6hdmjmoZuHbh66eejmoZuHbsWwgwwAAIB5iR3kCti8eXPuEWqHZh66eejmoZuHbh66eehWDDvIAAAAmJfYQa6ALVu25B6hdmjmoZuHbh66eejmoZuHbsWwg1winkFaHM08dPPQzUM3D908dPPQrTV2kCtgcHAw9wi1QzMP3Tx089DNQzcP3Tx0K4Yd5BINDw/r5ptvzj1GrdDMQzcP3Tx089DNQzcP3VpjB7kCDh06lHuE2qGZh24eunno5qGbh24euhXDArlEHR0duUeoHZp56Oahm4duHrp56OahWzEskEt04cKF3CPUDs08dPPQzUM3D908dPPQrRgWyCVasGBB7hFqh2Yeunno5qGbh24eunnoVgwL5BItWrQo9wi1QzMP3Tx089DNQzcP3Tx0K2ZOXsUiIs5KOv6Of+L6WyLpXO4haoZmHrp56Oahm4duHrp56Nbaz6eU3jv5zjlZIKO1iNjT6qVEMDWaeejmoZuHbh66eejmoVsxPMQCAAAAaMICGQAAAGjCArlc/5R7gBqimYduHrp56Oahm4duHroVwGOQAQAAgCbsIAMAAABNWCADAAAATVggz5GIuCUinouIQxFxMCJ+NSLujYjvRsRQROyJiF/OPWeVRMSdjTZXbj+KiE9GxK0R8e8R8YPGx/bcs1bJNN2+0jj+XomIzRFxS+5Zq2Sqbk3//dMRkSJiScYxK2W6ZhHRHRGHI+L7EfF3mUetlGnOUa4JM4iITzWOqQMR0RsRi7gmzGyKblwTCuAxyHMkIp6RtCOl9LWIaJP0HknPSvpqSuk7EfG7kp5IKX0o55xVFRHXSzol6Vck/ZWk11NKX46Iz0hqTymtzzpgRU3qdqekwZTSTyLibyWJbq01d0spHY+IZZK+JmmlpDUpJV5cf5JJx9r7JPVI+r2U0mhE3J5Sei3rgBU1qdvT4powpYj4OUn/IekDKaUfR8Szkv5N0gfENWFK03T7obgmzBo7yHMgIm6S9BuSvi5JKaWxlNL/SUqSbmr8sZs1cbCitYckHU0pHZf0EUnPNO5/RtLv5xqqBq52Syk9n1L6SeP+70rqyDhX1TUfb5L0VUlPaOKcRWvNzf5C0pdTSqOSxOJ4Ws3duCbM7AZJPxURN2hio+mH4powG2/pxjWhGBbIc+N9ks5K+ueI+F5EfC0iFkv6pKSvRMQJSX8v6W8yzlh1fySpt/HPP5NSOi1JjY+3Z5uq+pq7NftTSd8peZY6udotIh6RdCql9HLekSqv+Vh7v6Rfj4hdEbE9Iu7POFfVNXf7pLgmTCmldEoTXV6VdFrScErpeXFNmNY03ZpxTZgBC+S5cYOkD0r6x5TSL0kakfQZTeyyfCqltEzSp9TYYca1Gg9JeUTSt3PPUidTdYuIHkk/kfQvOeaquuZuEfEeTTxU4HN5p6q2FsfaDZLaJT0g6a8lPRsRkWm8ymrRjWvCNBqPLf6IpF+Q9LOSFkfEn+Sdqvpm6sY1YXZYIM+Nk5JOppR2Nf79OU0smD8uqa9x37cl8YSM1n5H0r6U0pnGv5+JiKWS1PjIr29bm9xNEfFxSQ9L+uPEEw6m0tztFzVxUXk5Io5p4leQ+yLijozzVdHkY+2kpL404T8ljUviyY1vNbkb14Tp/bak/0kpnU0pXdZEqy5xTZjJVN24JhTAAnkOpJT+V9KJiLizcddDkv5LE4+d+s3Gfb8l6QcZxquDR3XtwwS2auJCosbHfy19onq4pltEfFjSekmPpJTeyDZV9V3tllLan1K6PaXUmVLq1MTC74ONcxpvmnyObtHE9zRFxPsltUniiY1vNbkb14TpvSrpgYh4T+M3Eg9JOiiuCTNp2Y1rQjG8isUciYh7NfEs+DZJ/y3pMUmrJP2DJn4deUnSX6aU9uaasYoav+I+Iel9KaXhxn23aeIVQJZr4sT/g5TS6/mmrJ4puh2RtFDS+cYf+25KaV2mESupVbdJ//2YpPt4FYs3TXGstUn6hqR7JY1J+nRKaTDbkBU0RbdfE9eEaUXEFyX9oSYeEvA9SX8m6afFNWFaU3T7vrgmzBoLZAAAAKAJD7EAAAAAmrBABgAAAJqwQAYAAACasEAGAAAAmrBABgAAAJqwQAYAAACasEAGAAAAmvw/tSpycIADqyoAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 720x144 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,2))\n",
"plt.boxplot(df['Height'], vert=False, showmeans=True)\n",
"plt.grid(color='gray', linestyle='dotted')\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also make box plots of subsets of our dataset, for example, grouped by player role."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df.boxplot(column='Height', by='Role', figsize=(10,8))\n",
"plt.xticks(rotation='vertical')\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Note**: This diagram suggests, that on average, the heights of first basemen are higher than heights of second basemen. Later we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that. \n",
"\n",
"Age, height and weight are all continuous random variables. What do you think their distribution is? A good way to find out is to plot the histogram of values: "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAGqCAYAAAAWf7K6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAn10lEQVR4nO3de5hlZXnn/e9PUDS2AgatIJK0GkwE+g0TShIPMdWaUSNMMPOqwWEURmNHYw7GTt40mqjRkCEmaCZjoukEXjEqLSMeiJAoMTaoI2rDoA2iItIoBxsFBFoJSeM9f6xV8lDUqYu9a9fh+7muumrvZ6291r3vrq761VPPXjtVhSRJkqTO/UZdgCRJkrSUGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVnSgiU5PslH57nviUk+OeDz70jyC/3tVyf5uwEee1eSx/S335Hkjwd47Lcn+cNBHW8PzvvyJDv75/bDi33+PZWkkvz4qOuQtPoYkKVVJslJSc6bMnblDGPHzXasqnp3VT1jQHVtTfKrC318Vf1JVc35+Pmep6rWVNXXFlpPc757/WJQVS+rqjfe12PvYR33B94MPKN/bjdN2b62D6SXTBk/IMm/JdnRjP3gF5Mp+04k+X4fwHcluS7JH81S0+Q5J/ffkWTTfX6yknQfGZCl1edC4MlJ9gJI8iPA/YGfnjL24/2+q0qSvUddw5CMAQ8ELp9jvwcnOby5/1+Aq/fgPNf3AXwN8BTgJUmeM8dj9uv3fwHw2iTP2oPzDdTk/wFJq5sBWVp9PkcXiI/o7z8V+Djw5SljV1XV9Un2TXJakhv6GcE/boL0PWZHkzwjyZeT3Jrkr5NcMHW2NsmfJ7klydVJfrEfOxn4OeCt/UziW6crPMkLk1yT5KYkr5my7fVJ3tXffmCSd/X7fSfJ55KMzXSefhbzFUmuBK5sxto/7x+Q5Pwkt/fP68f6/SZnQfduatma5FeTPB54O/DE/nzf6bffY8lGkpcm+WqSm5Ock+SRzbZK8rJ+Rv+WJH+VJDP0Z58kf5Hk+v7jL/qxx/X/vgDfSfIv0z2+9/fACc39FwHvnGX/GVXV1cD/Bg6d5/6fpgvwh0/dluToJP8nyW1JvpHk9c22c5P85pT9vzAZzJP8ZP9vd3P/9fn8Zr93JHlbkvOSfBdYn+TZSb7Y/1tfl+R3F/D0JS1jBmRplamqfwM+QxeC6T9/AvjklLHJ2eMzgN10M8r/AXgGcK8lCkkOAN4HnAT8MF0ge9KU3X6mHz8AeBNwWpJU1Wv6Gn6jn338jWmOfyjwNuCFwCP7czxqhqd5ArAvcHC/38uAO+Y4z3P6+mYKc8cDb+xrvxR49wz7/UBVXdGf+9P9+fab5nk9DfjvwPOBA4FrgC1TdjsGeALwU/1+z5zhlK8BfpbuF52fAo4C/qCqvgIc1u+zX1U9bZay3wUcl2SvPuA/hO7rZY8lOQR4MnDRPPZNkif3df6faXb5Ll1Y3w84Gnh5MzN9BvBfm2P9FHAQcF6SBwPnA+8BHkE3S/3XSQ7jbv8FOJnuuX4SOA34tap6CF1Yn+0XCkkrkAFZWp0u4O4w/HN0ofETU8YuSDIG/CLwyqr6blXdCLwFmG5t8rOBy6vq/VW1G/hL4JtT9rmmqv62qu6iCzUH0v3pfz6eC3y4qi6sqjuBPwS+P8O+/04XjH+8qu6qqour6rY5jv/fq+rmqrpjhu3nNud+Dd2s8MHzrH02xwOnV9Ul/bFP6o+9ttnnlKr6TlV9nW62/4hZjvWGqrqxqr4F/BHdLxR74lq6X2J+ge4XjT2dPX5kP2t/G/AVunA914szvw3cDPwdsKmqPjZ1h6raWlXbq+r7VfUF4Ezg5/vNHwIO6QM5dM/5vf0vg8cAO6rq/6+q3VV1CXA23dfTpA9V1af6Y/8r3dfPoUkeWlW39I+RtIoYkKXV6ULgKUn2Bx5eVVfS/Sn8Sf3Y4f0+P0a3HOOGPvR8B/gbupm4qR4JfGPyTlUVXdhqfbPZ/r3+5pp51jz1+N8Fbpph378HPgJs6ZcavCndi9Rm8435bq+qXXSB7pEz7z5vj6SbNW6PfRPdDOik9heN7zFzz+5xrP72Qmp8J3Ai3Wzru/bwsddX1X5V9VC62d476H4Zms0BVbV/VT2+qv5yuh2S/EySjyf5VpJb6WbmDwDof7E4C/ivSe7X1/33/UN/DPiZya/f/mv4eOBHmsNP/bf/f+l+4bumX07zxPk9dUkrhQFZWp0+TbcEYQPwKYB+hvX6fuz6fv3oN4A76QLMfpPBp6oOm+aYN9AseejXyc60BGI6Ncf2G+iWTEwe/4foZonvfaCqf6+qP6qqQ+mWeRxD9+f52c4z1/nbc68BHkbXr+/2wz/U7NuGr7mOez1diJs89oPpntd1czxuzmMBP9qP7amz6ZYxfK2qrplr55lU1a10Sxv+00KP0XgPcA5wcFXtS7e2u12LfQZd8H068L1+PTN0X8MXNF+/+/XLXV7eljql7s9V1bF0vwh+kC58S1pFDMjSKtQvI9gGvIpuacWkT/ZjF/b73QB8FDg1yUOT3C/JY5P8/NRjAucC65I8p3/B2iu4Z1Ccy07gMbNsfx9wTJKnJHkA8AZm+B6WZH2SdeleTHgb3Z/M75rneWby7ObcbwQ+U1Xf6JcyXEc3e7lXkhcDj53yvB7VP2467wH+W5IjkuwD/El/7B0LqPFM4A+SPLxfE/5a9nwGeHJ2/mlMs9a8cf90L4ac/LjX1T/6XySOY+4rZ8zHQ4Cbq+pfkxxFt264rfnTdEtuTuXu2WOADwOPS/cCz/v3H0/o11ffS5IHpLu+975V9e90Xz93TbevpJXLgCytXhfQzZC160M/0Y+1l3d7EfAA4IvALXRB9cCpB6uqbwPPo3vx3U10L3bbRjcDPR//A3huf6WGe/2Zvaoupwvd76GbTb6Fey/hmPQjfZ23AVfQPdfJoDjreWbxHuB1dEsrjqSbrZz0UuD36J73YXTLVSb9C11A/GaSb0/zvD5Gt5767P55PZbp13jPxx/T9fwLwHbgkn5sj1XVtqq6apZdzqNbPjH58fp+/JHpr2tMt8TjYdyzVwv168AbktxOF/ynm9V9J7CO5peCqrqd7oWlx9HNpn8T+FNgn1nO9UJgR7+O+mU0LwCUtDqkWyYoSYPVrwW9Fji+qj4+6nq08iV5EbChqp4y6lokLW/OIEsamCTPTLJfv1Tg1XRrROe8xJd0X/Vr0n8d2DzqWiQtfwZkSYP0ROAqust2/SfgObNcNk0aiCTPBL5Ft977PSMuR9IK4BILSZIkqeEMsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1Nh71AXcFwcccECtXbt21GUsad/97nd58IMfPOoyVhR7Ohz2dfDs6XDY18Gzp8NhX+d28cUXf7uqHj51fFkH5LVr17Jt27ZRl7Gkbd26lYmJiVGXsaLY0+Gwr4NnT4fDvg6ePR0O+zq3JNdMN+4SC0mSJKlhQJYkSZIaBmRJkiSpMbSAnOTgJB9PckWSy5P8dj/+sCTnJ7my/7x/85iTknw1yZeTPHNYtUmSJEkzGeYM8m5gY1U9HvhZ4BVJDgU2AR+rqkOAj/X36bcdBxwGPAv46yR7DbE+SZIk6V6GFpCr6oaquqS/fTtwBXAQcCxwRr/bGcBz+tvHAluq6s6quhr4KnDUsOqTJEmSppOqGv5JkrXAhcDhwNerar9m2y1VtX+StwIXVdW7+vHTgH+sqvdNOdYGYAPA2NjYkVu2bBl6/cvZrl27WLNmzajLWFHs6XDY18Gzp8NhXwfPng6HfZ3b+vXrL66q8anjQ78OcpI1wNnAK6vqtiQz7jrN2L3Se1VtBjYDjI+Pl9f3m53XQBw8ezoc9nXw7Olw2NfBs6fDYV8XbqhXsUhyf7pw/O6qen8/vDPJgf32A4Eb+/FrgYObhz8KuH6Y9UmSJElTDfMqFgFOA66oqjc3m84BTuhvnwB8qBk/Lsk+SR4NHAJ8dlj1SZIkSdMZ5hKLJwMvBLYnubQfezVwCnBWkpcAXweeB1BVlyc5C/gi3RUwXlFVdw2xPkmSJOlehhaQq+qTTL+uGODpMzzmZODkYdUkSZIkzcV30pMkSZIaBmRJkiSpYUCWJEmSGkO/DrKk1WXtpnMX9LiN63Zz4gIfuyd2nHL00M8hSVrenEGWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpsfeoC5C0Z9ZuOnfUJUiStKINbQY5yelJbkxyWTP23iSX9h87klzaj69Nckez7e3DqkuSJEmazTBnkN8BvBV45+RAVf3K5O0kpwK3NvtfVVVHDLEeSZIkaU5DC8hVdWGStdNtSxLg+cDThnV+SZIkaSFSVcM7eBeQP1xVh08Zfyrw5qoab/a7HPgKcBvwB1X1iRmOuQHYADA2Nnbkli1bhlb/SrBr1y7WrFkz6jJWlFH3dPt1t8690zI09iDYecfwz7PuoH2Hf5IlYtRfqyuVfR08ezoc9nVu69evv3gyj7ZG9SK9FwBnNvdvAH60qm5KciTwwSSHVdVtUx9YVZuBzQDj4+M1MTGxGPUuW1u3bsUeDdaoe3riCn2R3sZ1uzl1+/C/Je04fmLo51gqRv21ulLZ18Gzp8NhXxdu0S/zlmRv4D8D750cq6o7q+qm/vbFwFXA4xa7NkmSJGkU10H+BeBLVXXt5ECShyfZq7/9GOAQ4GsjqE2SJEmr3DAv83Ym8GngJ5Jcm+Ql/abjuOfyCoCnAl9I8nngfcDLqurmYdUmSZIkzWSYV7F4wQzjJ04zdjZw9rBqkSRJkubLt5qWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKmx97AOnOR04Bjgxqo6vB97PfBS4Fv9bq+uqvP6bScBLwHuAn6rqj4yrNokrV5rN5076hJmteOUo0ddgiStesOcQX4H8Kxpxt9SVUf0H5Ph+FDgOOCw/jF/nWSvIdYmSZIkTWtoAbmqLgRunufuxwJbqurOqroa+Cpw1LBqkyRJkmaSqhrewZO1wIenLLE4EbgN2AZsrKpbkrwVuKiq3tXvdxrwj1X1vmmOuQHYADA2Nnbkli1bhlb/SrBr1y7WrFkz6jJWlFH3dPt1t47s3MM09iDYeceoqxi9dQftO7BjjfprdaWyr4NnT4fDvs5t/fr1F1fV+NTxoa1BnsHbgDcC1X8+FXgxkGn2nTa5V9VmYDPA+Ph4TUxMDKXQlWLr1q3Yo8EadU9PXOJraBdq47rdnLp9sb8lLT07jp8Y2LFG/bW6UtnXwbOnw2FfF25RfxpV1c7J20n+Fvhwf/da4OBm10cB1y9iadIPzPUiro3rdq/YkCpJkhb5Mm9JDmzu/jJwWX/7HOC4JPskeTRwCPDZxaxNkiRJguFe5u1MYAI4IMm1wOuAiSRH0C2f2AH8GkBVXZ7kLOCLwG7gFVV117BqkyRJkmYytIBcVS+YZvi0WfY/GTh5WPVIkiRJ8+E76UmSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSY2hBeQkpye5McllzdifJflSki8k+UCS/frxtUnuSHJp//H2YdUlSZIkzWaYM8jvAJ41Zex84PCq+n+ArwAnNduuqqoj+o+XDbEuSZIkaUZDC8hVdSFw85Sxj1bV7v7uRcCjhnV+SZIkaSFSVcM7eLIW+HBVHT7Ntn8A3ltV7+r3u5xuVvk24A+q6hMzHHMDsAFgbGzsyC1btgyp+pVh165drFmzZtRlLCvbr7t11u1jD4KddyxSMauIfe2sO2jfgR3L///DYV8Hz54Oh32d2/r16y+uqvGp43uPopgkrwF2A+/uh24AfrSqbkpyJPDBJIdV1W1TH1tVm4HNAOPj4zUxMbFIVS9PW7duxR7tmRM3nTvr9o3rdnPq9pH811nR7Gtnx/ETAzuW//+Hw74Onj0dDvu6cIt+FYskJwDHAMdXP31dVXdW1U397YuBq4DHLXZtkiRJ0qIG5CTPAn4f+KWq+l4z/vAke/W3HwMcAnxtMWuTJEmSYIhLLJKcCUwAByS5Fngd3VUr9gHOTwJwUX/FiqcCb0iyG7gLeFlV3TztgSVJkqQhGlpArqoXTDN82gz7ng2cPaxaJEmSpPnynfQkSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpMa8AnKSJ89nTJIkSVru5juD/D/nOSZJkiQta3vPtjHJE4EnAQ9P8qpm00OBvYZZmCRJkjQKswZk4AHAmn6/hzTjtwHPHVZRkiRJ0qjMGpCr6gLggiTvqKprFqkmSZIkaWTmmkGetE+SzcDa9jFV9bRhFCVJkiSNynwD8v8C3g78HXDX8MqRJEmSRmu+AXl3Vb1tqJVIkiRJS8B8L/P2D0l+PcmBSR42+THUyiRJkqQRmO8M8gn9599rxgp4zGDLkSRJkkZrXgG5qh497EIkSZKkpWBeATnJi6Ybr6p3DrYcSZIkabTmu8TiCc3tBwJPBy4BDMiSJElaUea7xOI32/tJ9gX+frbHJDkdOAa4saoO78ceBryX7nrKO4DnV9Ut/baTgJfQXUbut6rqI3vyRCRJkqRBmO8M8lTfAw6ZY593AG/lnrPMm4CPVdUpSTb1938/yaHAccBhwCOBf07yuKrymsuSVpW1m84d2LE2rtvNiQM83o5Tjh7YsSRpKZvvGuR/oLtqBcBewOOBs2Z7TFVdmGTtlOFjgYn+9hnAVuD3+/EtVXUncHWSrwJHAZ+eT32SJEnSoKSq5t4p+fnm7m7gmqq6dh6PWwt8uFli8Z2q2q/ZfktV7Z/krcBFVfWufvw04B+r6n3THHMDsAFgbGzsyC1btsxZ/2q2a9cu1qxZM+oylpXt19066/axB8HOOxapmFXEvg7eoHu67qB9B3ewZczvq4NnT4fDvs5t/fr1F1fV+NTx+a5BviDJGHe/WO/KQRYHZLrTzlDLZmAzwPj4eE1MTAy4lJVl69at2KM9M9efpDeu282p2xe6Okkzsa+DN+ie7jh+YmDHWs78vjp49nQ47OvCzeud9JI8H/gs8Dzg+cBnkjx3AefbmeTA/pgHAjf249cCBzf7PQq4fgHHlyRJku6T+b7V9GuAJ1TVCVX1Irr1wX+4gPOdw93vyncC8KFm/Lgk+yR5NN0LAD+7gONLkiRJ98l8//Z2v6q6sbl/E3OE6yRn0r0g74Ak1wKvA04BzkryEuDrdDPSVNXlSc4Cvki3xvkVXsFCkiRJozDfgPxPST4CnNnf/xXgvNkeUFUvmGHT02fY/2Tg5HnWI0mSJA3FrAE5yY8DY1X1e0n+M/AUuhfUfRp49yLUJ0mSJC2qudYg/wVwO0BVvb+qXlVVv0M3e/wXwy1NkiRJWnxzBeS1VfWFqYNVtY3u7aIlSZKkFWWugPzAWbY9aJCFSJIkSUvBXAH5c0leOnWwvwrFxcMpSZIkSRqdua5i8UrgA0mO5+5APA48APjlIdYlSZIkjcSsAbmqdgJPSrIeOLwfPreq/mXolUmSJEkjMK/rIFfVx4GPD7kWSZIkaeTm+1bTkiRJ0qpgQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpYUCWJEmSGgZkSZIkqWFAliRJkhoGZEmSJKlhQJYkSZIaBmRJkiSpsfdinzDJTwDvbYYeA7wW2A94KfCtfvzVVXXe4lYnSZKk1W7RA3JVfRk4AiDJXsB1wAeA/wa8par+fLFrkiRJkiaNeonF04GrquqaEdchSZIkAZCqGt3Jk9OBS6rqrUleD5wI3AZsAzZW1S3TPGYDsAFgbGzsyC1btixewcvQrl27WLNmzajLWFa2X3frrNvHHgQ771ikYlYR+zp4g+7puoP2HdzBljG/rw6ePR0O+zq39evXX1xV41PHRxaQkzwAuB44rKp2JhkDvg0U8EbgwKp68WzHGB8fr23btg2/2GVs69atTExMjLqMZWXtpnNn3b5x3W5O3b7oq5NWPPs6eKutpztOOXpRzuP31cGzp8NhX+eWZNqAPMolFr9IN3u8E6CqdlbVXVX1feBvgaNGWJskSZJWqVFOLbwAOHPyTpIDq+qG/u4vA5eNpCoN3VwztJIkSaM0koCc5IeA/wj8WjP8piRH0C2x2DFlmyRJkrQoRhKQq+p7wA9PGXvhKGqRJEmSWqO+zJskSZK0pBiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqTG3qM4aZIdwO3AXcDuqhpP8jDgvcBaYAfw/Kq6ZRT1SZIkafUa5Qzy+qo6oqrG+/ubgI9V1SHAx/r7kiRJ0qJaSkssjgXO6G+fATxndKVIkiRptUpVLf5Jk6uBW4AC/qaqNif5TlXt1+xzS1XtP81jNwAbAMbGxo7csmXLIlW9PO3atYs1a9aMuox72H7draMu4T4ZexDsvGPUVaw89nXwVltP1x2076KcZyl+X13u7Olw2Ne5rV+//uJmNcMPjGQNMvDkqro+ySOA85N8ab4PrKrNwGaA8fHxmpiYGFKJK8PWrVtZaj06cdO5oy7hPtm4bjenbh/Vf52Vy74O3mrr6Y7jJxblPEvx++pyZ0+Hw74u3EiWWFTV9f3nG4EPAEcBO5McCNB/vnEUtUmSJGl1W/SAnOTBSR4yeRt4BnAZcA5wQr/bCcCHFrs2SZIkaRR/exsDPpBk8vzvqap/SvI54KwkLwG+DjxvBLVJkiRplVv0gFxVXwN+aprxm4CnL3Y9kiRJUmspXeZNkiRJGjkDsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDQOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJjb1HXYAkSYOwdtO5i3Kejet2c+ICzrXjlKOHUI2kYXAGWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkhgFZkiRJauw96gI0eGs3nfuD2xvX7ebE5r4kSZJm5wyyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEmNRQ/ISQ5O8vEkVyS5PMlv9+OvT3Jdkkv7j2cvdm2SJEnSKK5isRvYWFWXJHkIcHGS8/ttb6mqPx9BTZIkSRIwgoBcVTcAN/S3b09yBXDQYtchSZIkTSdVNbqTJ2uBC4HDgVcBJwK3AdvoZplvmeYxG4ANAGNjY0du2bJlscpdNrZfd+sPbo89CHbeMcJiViB7Ohz2dfDs6XAstK/rDtp38MWsELt27WLNmjWjLmPFsa9zW79+/cVVNT51fGQBOcka4ALg5Kp6f5Ix4NtAAW8EDqyqF892jPHx8dq2bdvwi11mpr5RyKnbfT+YQbKnw2FfB8+eDsdC+7rjlKOHUM3KsHXrViYmJkZdxopjX+eWZNqAPJKrWCS5P3A28O6qej9AVe2sqruq6vvA3wJHjaI2SZIkrW6juIpFgNOAK6rqzc34gc1uvwxctti1SZIkSaP429uTgRcC25Nc2o+9GnhBkiPolljsAH5tBLVJkjQU7fK3pcglINLdRnEVi08CmWbTeYtdiyRJkjSV76QnSZIkNQzIkiRJUsOALEmSJDUMyJIkSVLDgCxJkiQ1DMiSJElSw4AsSZIkNQzIkiRJUsOALEmSJDUMyJIkSVLDgCxJkiQ1DMiSJElSw4AsSZIkNQzIkiRJUsOALEmSJDUMyJIkSVLDgCxJkiQ1DMiSJElSw4AsSZIkNQzIkiRJUsOALEmSJDUMyJIkSVLDgCxJkiQ1DMiSJElSY+9RF7Acrd107qhLkCRJ0pA4gyxJkiQ1nEGWJEkj/evoxnW7OXGO8+845ehFqkZyBlmSJEm6BwOyJEmS1DAgS5IkSQ0DsiRJktQwIEuSJEkNA7IkSZLUMCBLkiRJDa+DLEmSdB8txXfZba8v7XWk98ySm0FO8qwkX07y1SSbRl2PJEmSVpclNYOcZC/gr4D/CFwLfC7JOVX1xdFWJkmSRmkpztAuJ0u9f0tthnupzSAfBXy1qr5WVf8GbAGOHXFNkiRJWkVSVaOu4QeSPBd4VlX9an//hcDPVNVvNPtsADb0d38C+PKiF7q8HAB8e9RFrDD2dDjs6+DZ0+Gwr4NnT4fDvs7tx6rq4VMHl9QSCyDTjN0jwVfVZmDz4pSz/CXZVlXjo65jJbGnw2FfB8+eDod9HTx7Ohz2deGW2hKLa4GDm/uPAq4fUS2SJElahZZaQP4ccEiSRyd5AHAccM6Ia5IkSdIqsqSWWFTV7iS/AXwE2As4vaouH3FZy53LUQbPng6HfR08ezoc9nXw7Olw2NcFWlIv0pMkSZJGbaktsZAkSZJGyoAsSZIkNQzIy1yS05PcmOSyKeO/2b9l9+VJ3tSMn9S/jfeXkzxz8Ste+qbraZIjklyU5NIk25Ic1Wyzp3NIcnCSjye5ov+a/O1+/GFJzk9yZf95/+Yx9nUOs/T1z5J8KckXknwgyX7NY+zrLGbqabP9d5NUkgOaMXs6h9n66s+rhZnl/78/rwahqvxYxh/AU4GfBi5rxtYD/wzs099/RP/5UODzwD7Ao4GrgL1G/RyW2scMPf0o8Iv97WcDW+3pHvX0QOCn+9sPAb7S9+5NwKZ+fBPwp/Z1IH19BrB3P/6n9vW+97S/fzDdi8ivAQ6wp/e9r/68GkpP/Xk1gA9nkJe5qroQuHnK8MuBU6rqzn6fG/vxY4EtVXVnVV0NfJXu7b3VmKGnBTy0v70vd1+f257OQ1XdUFWX9LdvB64ADqLr3xn9bmcAz+lv29d5mKmvVfXRqtrd73YR3TXlwb7OaZavVYC3AP8f93wDK3s6D7P01Z9XCzRLT/15NQAG5JXpccDPJflMkguSPKEfPwj4RrPftdz9jV+zeyXwZ0m+Afw5cFI/bk/3UJK1wH8APgOMVdUN0H2zBx7R72Zf99CUvrZeDPxjf9u+7oG2p0l+Cbiuqj4/ZTd7uoemfK3682oApvT0lfjz6j4zIK9MewP7Az8L/B5wVpIwj7fy1oxeDvxOVR0M/A5wWj9uT/dAkjXA2cArq+q22XadZsy+zmCmviZ5DbAbePfk0DQPt6/TaHtK18PXAK+dbtdpxuzpDKb5WvXn1X00TU/9eTUABuSV6Vrg/dX5LPB94AB8K+/74gTg/f3t/8Xdf5ayp/OU5P5038TfXVWTvdyZ5MB++4HA5J9X7es8zdBXkpwAHAMcX/0CROzrvEzT08fSrdn8fJIddH27JMmPYE/nbYavVX9e3Qcz9NSfVwNgQF6ZPgg8DSDJ44AHAN+me9vu45Lsk+TRwCHAZ0dV5DJzPfDz/e2nAVf2t+3pPPQzQqcBV1TVm5tN59B9M6f//KFm3L7OYaa+JnkW8PvAL1XV95qH2Nc5TNfTqtpeVY+oqrVVtZYuaPx0VX0Tezovs3wP+CD+vFqQWXrqz6sBWFJvNa09l+RMYAI4IMm1wOuA04HT012m7N+AE/oZpMuTnAV8ke5Phq+oqrtGU/nSNUNPXwr8jyR7A/8KbACoKns6P08GXghsT3JpP/Zq4BS6P6m+BPg68Dywr3tgpr7+Jd0r1c/vfoZyUVW9zL7Oy7Q9rarzptvZns7bTF+r/rxauJl66s+rAfCtpiVJkqSGSywkSZKkhgFZkiRJahiQJUmSpIYBWZIkSWoYkCVJkqSGAVmSlqAkb0nyyub+R5L8XXP/1CSvmuGxb0jyC3Mc//VJfnea8f2S/Pp9KF2Slj0DsiQtTf8beBJAkvvRvbvYYc32JwGfmu6BVfXaqvrnBZ53P8CALGlVMyBL0tL0KfqATBeMLwNuT7J/kn2AxwMkuSDJxf0M8+Tbdr8jyXP7289O8qUkn0zyl0k+3Jzj0CRbk3wtyW/1Y6cAj01yaZI/W4wnKklLje+kJ0lLUFVdn2R3kh+lC8qfBg4CngjcClwBvAU4tqq+leRXgJOBF08eI8kDgb8BnlpVV/fvEtn6SWA98BDgy0neBmwCDq+qI4b6BCVpCTMgS9LSNTmL/CTgzXQB+Ul0Afk64Bnc/XbSewE3THn8TwJfq6qr+/tn0r/tbO/cqroTuDPJjcDYkJ6HJC0rBmRJWrom1yGvo1ti8Q1gI3Ab8C/AQVX1xFkenzmOf2dz+y78mSBJgGuQJWkp+xRwDHBzVd1VVTfTvYjuicB7gYcneSJAkvsnOWzK478EPCbJ2v7+r8zjnLfTLbmQpFXLgCxJS9d2uqtXXDRl7NaquhF4LvCnST4PXMrdL+oDoKruoLsixT8l+SSwk255xoyq6ibgU0ku80V6klarVNWoa5AkDUmSNVW1K91C5b8Crqyqt4y6LklaypxBlqSV7aVJLgUuB/alu6qFJGkWziBLkiRJDWeQJUmSpIYBWZIkSWoYkCVJkqSGAVmSJElqGJAlSZKkxv8FiHh2DxCDPowAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df['Weight'].hist(bins=15, figsize=(10,6))\n",
"plt.suptitle('Weight distribution of MLB Players')\n",
"plt.xlabel('Weight')\n",
"plt.ylabel('Count')\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Normal Distribution\n",
"\n",
"Let's create an artificial sample of weights that follows a normal distribution with the same mean and variance as our real data:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([73.46072234, 70.40678311, 70.23689776, 73.81190675, 72.41091792,\n",
" 76.00127651, 71.91641414, 77.18162239, 76.7173353 , 73.93996587,\n",
" 74.2862748 , 76.88034696, 72.15184905, 74.43537605, 76.37723417,\n",
" 65.66976051, 74.3200533 , 77.3235274 , 72.8840488 , 77.50300255])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"generated = np.random.normal(mean, std, 1000)\n",
"generated[:20]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6))\n",
"plt.hist(generated, bins=15)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6))\n",
"plt.hist(np.random.normal(0,1,50000), bins=300)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since most values in real life are normally distributed, we should not use a uniform random number generator to generate sample data. Here is what happens if we try to generate weights with a uniform distribution (generated by `np.random.rand`):"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAGoCAYAAABbtxOxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAATQElEQVR4nO3db6ykd3nf4e9db4FCFGHLx+7GNl1TbUgMapv0hKaNWkV10zoxst1WREakWgVLWyoSSNUorItUV4qQnCbqnxdNpS1xs2opxCKktorSYC35o7wAugaSYAy1G4y99sZekhSSRjI13H1xJs7tk13WPnPOzK73uiRrZn4zc+Z+8dPZj57zeJ7q7gAAAFv+zLoHAACA84lABgCAQSADAMAgkAEAYBDIAAAw7Fv3AEly+eWX94EDB9Y9BgAAF5H777//i929sX39vAjkAwcO5MSJE+seAwCAi0hVfeFM606xAACAQSADAMAgkAEAYBDIAAAwnDOQq+quqnqqqj491n6yqj5bVb9ZVb9QVa8cz91eVQ9X1eeq6u/t0dwAALAnns8R5J9NcsO2tfuSvK67/1KS/5Xk9iSpquuS3JrktYv3/HRVXbJr0wIAwB47ZyB3968l+b1tax/u7mcWDz+a5OrF/ZuTvL+7n+7uzyd5OMnrd3FeAADYU7txDvJbkvzi4v5VSR4bz51crAEAwAVhqUCuqncleSbJe/946Qwv67O893BVnaiqE6dPn15mDAAA2DU7DuSqOpTkDUne3N1/HMEnk1wzXnZ1kifO9P7uPtrdm929ubHxp67wBwAAa7GjQK6qG5K8M8lN3f1H46l7k9xaVS+tqmuTHEzy8eXHBACA1dh3rhdU1fuSfHeSy6vqZJI7svWtFS9Ncl9VJclHu/ut3f1AVd2d5DPZOvXibd391b0aHgAAdlv9ydkR67O5udknTpxY9xgAAFxEqur+7t7cvu5KegAAMAhkAAAYBDIAAAwCGQAAhnN+iwW8GBw48qF1j7Byj9x547pHAIALkiPIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwuJLeRehivKocAMDz5QgyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwHDRf4uFb3QAAGByBBkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAw75zvaCq7kryhiRPdffrFmuXJfm5JAeSPJLk+7v79xfP3Z7ktiRfTfL27v6lPZkc+LoOHPnQukdYuUfuvHHdIwDwIvB8jiD/bJIbtq0dSXK8uw8mOb54nKq6LsmtSV67eM9PV9UluzYtAADssXMGcnf/WpLf27Z8c5Jji/vHktwy1t/f3U939+eTPJzk9bszKgAA7L2dnoN8ZXefSpLF7RWL9auSPDZed3KxBgAAF4RznoP8AtUZ1vqML6w6nORwkrzqVa/a5TEAeDFzjj0vVvb2+WGnR5CfrKr9SbK4fWqxfjLJNeN1Vyd54kw/oLuPdvdmd29ubGzscAwAANhdOw3ke5McWtw/lOSesX5rVb20qq5NcjDJx5cbEQAAVuf5fM3b+5J8d5LLq+pkkjuS3Jnk7qq6LcmjSd6YJN39QFXdneQzSZ5J8rbu/uoezQ4AALvunIHc3W86y1PXn+X1707y7mWGAgCAdXElPQAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwLBv3QMA7JYDRz607hFW7pE7b1z3CAAvOo4gAwDAIJABAGAQyAAAMAhkAAAYBDIAAAwCGQAABoEMAACDQAYAgMGFQgDgAuBCOLA6jiADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAMO+dQ8AwM4dOPKhdY8A8KLjCDIAAAwCGQAABoEMAACDc5ABgPOSc+xZF0eQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGJYK5Kr6p1X1QFV9uqreV1Uvq6rLquq+qnpocXvpbg0LAAB7bceBXFVXJXl7ks3ufl2SS5LcmuRIkuPdfTDJ8cVjAAC4ICx7isW+JH+uqvYleXmSJ5LcnOTY4vljSW5Z8jMAAGBldhzI3f14kp9K8miSU0m+1N0fTnJld59avOZUkit2Y1AAAFiFZU6xuDRbR4uvTfJNSV5RVT/wAt5/uKpOVNWJ06dP73QMAADYVcucYvF3kny+u0939/9L8sEkfyPJk1W1P0kWt0+d6c3dfbS7N7t7c2NjY4kxAABg9ywTyI8m+c6qenlVVZLrkzyY5N4khxavOZTknuVGBACA1dm30zd298eq6gNJPpHkmSSfTHI0yTckubuqbstWRL9xNwYFAIBV2HEgJ0l335Hkjm3LT2fraDIAAFxwXEkPAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABgEMgAADAIZAAAGgQwAAINABgCAQSADAMAgkAEAYBDIAAAwCGQAABiWCuSqemVVfaCqPltVD1bVX6+qy6rqvqp6aHF76W4NCwAAe23ZI8j/Lsn/6O5vSfKXkzyY5EiS4919MMnxxWMAALgg7DiQq+obk/ytJD+TJN39le7+P0luTnJs8bJjSW5ZbkQAAFidZY4gvzrJ6ST/qao+WVXvqapXJLmyu08lyeL2il2YEwAAVmKZQN6X5NuT/Ifu/rYk/zcv4HSKqjpcVSeq6sTp06eXGAMAAHbPMoF8MsnJ7v7Y4vEHshXMT1bV/iRZ3D51pjd399Hu3uzuzY2NjSXGAACA3bPjQO7u30nyWFW9ZrF0fZLPJLk3yaHF2qEk9yw1IQAArNC+Jd//w0neW1UvSfLbSX4wW9F9d1XdluTRJG9c8jMAAGBllgrk7v5Uks0zPHX9Mj8XAADWxZX0AABgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwLB0IFfVJVX1yar674vHl1XVfVX10OL20uXHBACA1diNI8jvSPLgeHwkyfHuPpjk+OIxAABcEJYK5Kq6OsmNSd4zlm9Ocmxx/1iSW5b5DAAAWKVljyD/2yQ/luRrY+3K7j6VJIvbK870xqo6XFUnqurE6dOnlxwDAAB2x44DuarekOSp7r5/J+/v7qPdvdndmxsbGzsdAwAAdtW+Jd77XUluqqrvS/KyJN9YVf8lyZNVtb+7T1XV/iRP7cagAACwCjs+gtzdt3f31d19IMmtST7S3T+Q5N4khxYvO5TknqWnBACAFdmL70G+M8n3VNVDSb5n8RgAAC4Iy5xi8azu/pUkv7K4/7tJrt+NnwsAAKvmSnoAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwLDjQK6qa6rql6vqwap6oKresVi/rKruq6qHFreX7t64AACwt5Y5gvxMkn/W3d+a5DuTvK2qrktyJMnx7j6Y5PjiMQAAXBB2HMjdfaq7P7G4/wdJHkxyVZKbkxxbvOxYkluWnBEAAFZmV85BrqoDSb4tyceSXNndp5KtiE5yxVnec7iqTlTVidOnT+/GGAAAsLSlA7mqviHJzyf5ke7+8vN9X3cf7e7N7t7c2NhYdgwAANgVSwVyVf3ZbMXxe7v7g4vlJ6tq/+L5/UmeWm5EAABYnWW+xaKS/EySB7v7X4+n7k1yaHH/UJJ7dj4eAACs1r4l3vtdSf5Rkt+qqk8t1v55kjuT3F1VtyV5NMkbl5oQAABWaMeB3N2/nqTO8vT1O/25AACwTq6kBwAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMAhkAAAaBDAAAg0AGAIBBIAMAwCCQAQBgEMgAADAIZAAAGAQyAAAMexbIVXVDVX2uqh6uqiN79TkAALCb9iSQq+qSJP8+yfcmuS7Jm6rqur34LAAA2E17dQT59Uke7u7f7u6vJHl/kpv36LMAAGDX7Nujn3tVksfG45NJ/tp8QVUdTnJ48fAPq+pzezQLe+/yJF9c9xCcN+wHtrMn2M6e4Fn1E0nWtyf+wpkW9yqQ6wxr/ZwH3UeTHN2jz2eFqupEd2+uew7OD/YD29kTbGdPsN35tif26hSLk0muGY+vTvLEHn0WAADsmr0K5P+Z5GBVXVtVL0lya5J79+izAABg1+zJKRbd/UxV/VCSX0pySZK7uvuBvfgszgtOlWGyH9jOnmA7e4Ltzqs9Ud197lcBAMBFwpX0AABgEMgAADAIZJ63qnpNVX1q/PflqvqRqvrJqvpsVf1mVf1CVb1y3bOyGl9nT/z4Yj98qqo+XFXftO5ZWY2z7Ynx/I9WVVfV5WsckxX5Or8j/mVVPT7Wv2/ds7IaX+93RFX9cFV9rqoeqKp/tdY5nYPMTiwuJ/54ti4A85okH1n8z5k/kSTd/c51zsfqbdsTv9/dX16svz3Jdd391nXOx+rNPdHdX6iqa5K8J8m3JPmr3e1CEReRbb8jfjDJH3b3T613KtZp2554dZJ3Jbmxu5+uqiu6+6l1zeYIMjt1fZL/3d1f6O4Pd/czi/WPZut7r7n4zD3x5bH+imy7UBAXjWf3xOLxv0nyY7EfLlbb9wPMPfFPktzZ3U8nyTrjOBHI7NytSd53hvW3JPnFFc/C+eE5e6Kq3l1VjyV5c5J/sbapWKdn90RV3ZTk8e7+jfWOxBpt/3fjhxanYt1VVZeuayjWau6Jb07yN6vqY1X1q1X1HWucyykWvHCLi788keS13f3kWH9Xks0k/6BtrIvK2fbE4rnbk7ysu+9Yy3CsxdwTSf4gyS8n+bvd/aWqeiTJplMsLh7bf0dU1ZVJvpitvyb8eJL93f2Wdc7Iap1hT3w6yUeSvCPJdyT5uSSvXldPOILMTnxvkk9si+NDSd6Q5M3i+KL0p/bE8F+T/MMVz8P6zT3xF5Ncm+Q3FnF8dZJPVNWfX+N8rNZzfkd095Pd/dXu/lqS/5jk9WudjnXY/u/GySQf7C0fT/K1JGv7n3kFMjvxpjz3T+k3JHlnkpu6+4/WNhXrtH1PHBzP3ZTksyufiHV7dk9092919xXdfaC7D2TrH8Jv7+7fWeeArNT23xH7x3N/P8mnVz4R6/acPZHkvyX520lSVd+c5CXZ+ivDWjjFghekql6e5LFs/dnjS4u1h5O8NMnvLl72Ud9YcPE4y574+Wx9u8nXknwhyVu7+/H1TckqnWlPbHv+kTjF4qJxlt8R/znJX8nWKRaPJPnH3X1qXTOyWmfZEy9Jcle29sVXkvxod39kbTMKZAAA+BNOsQAAgEEgAwDAIJABAGAQyAAAMAhkAAAYBDIAAAwCGQAAhv8PCCPnhqb/Rl0AAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"wrong_sample = np.random.rand(1000)*2*std+mean-std\n",
"plt.figure(figsize=(10,6))\n",
"plt.hist(wrong_sample)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Confidence Intervals\n",
"\n",
"Let's now calculate confidence intervals for the weights and heights of baseball players. We will use the code [from this stackoverflow discussion](https://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data):"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p=0.85, mean = 201.73 ± 0.94\n",
"p=0.90, mean = 201.73 ± 1.08\n",
"p=0.95, mean = 201.73 ± 1.28\n"
]
}
],
"source": [
"import scipy.stats\n",
"\n",
"def mean_confidence_interval(data, confidence=0.95):\n",
" a = 1.0 * np.array(data)\n",
" n = len(a)\n",
" m, se = np.mean(a), scipy.stats.sem(a)\n",
" h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)\n",
" return m, h\n",
"\n",
"for p in [0.85, 0.9, 0.95]:\n",
" m, h = mean_confidence_interval(df['Weight'].fillna(method='pad'),p)\n",
" print(f\"p={p:.2f}, mean = {m:.2f} ± {h:.2f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hypothesis Testing\n",
"\n",
"Let's explore different roles in our baseball players dataset:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>Count</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Role</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Catcher</th>\n",
" <td>72.723684</td>\n",
" <td>204.328947</td>\n",
" <td>76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Designated_Hitter</th>\n",
" <td>74.222222</td>\n",
" <td>220.888889</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>First_Baseman</th>\n",
" <td>74.000000</td>\n",
" <td>213.109091</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Outfielder</th>\n",
" <td>73.010309</td>\n",
" <td>199.113402</td>\n",
" <td>194</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Relief_Pitcher</th>\n",
" <td>74.374603</td>\n",
" <td>203.517460</td>\n",
" <td>315</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Second_Baseman</th>\n",
" <td>71.362069</td>\n",
" <td>184.344828</td>\n",
" <td>58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Shortstop</th>\n",
" <td>71.903846</td>\n",
" <td>182.923077</td>\n",
" <td>52</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Starting_Pitcher</th>\n",
" <td>74.719457</td>\n",
" <td>205.163636</td>\n",
" <td>221</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Third_Baseman</th>\n",
" <td>73.044444</td>\n",
" <td>200.955556</td>\n",
" <td>45</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Height Weight Count\n",
"Role \n",
"Catcher 72.723684 204.328947 76\n",
"Designated_Hitter 74.222222 220.888889 18\n",
"First_Baseman 74.000000 213.109091 55\n",
"Outfielder 73.010309 199.113402 194\n",
"Relief_Pitcher 74.374603 203.517460 315\n",
"Second_Baseman 71.362069 184.344828 58\n",
"Shortstop 71.903846 182.923077 52\n",
"Starting_Pitcher 74.719457 205.163636 221\n",
"Third_Baseman 73.044444 200.955556 45"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Role').agg({ 'Height' : 'mean', 'Weight' : 'mean', 'Age' : 'count'}).rename(columns={ 'Age' : 'Count'})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's test the hypothesis that First Basemen are taller than Second Basemen. The simplest way to do this is to test the confidence intervals:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Conf=0.85, 1st basemen height: 73.62..74.38, 2nd basemen height: 71.04..71.69\n",
"Conf=0.90, 1st basemen height: 73.56..74.44, 2nd basemen height: 70.99..71.73\n",
"Conf=0.95, 1st basemen height: 73.47..74.53, 2nd basemen height: 70.92..71.81\n"
]
}
],
"source": [
"for p in [0.85,0.9,0.95]:\n",
" m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Height']],p)\n",
" m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Height']],p)\n",
" print(f'Conf={p:.2f}, 1st basemen height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen height: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the intervals do not overlap.\n",
"\n",
"A statistically more correct way to prove the hypothesis is to use a **Student t-test**:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"T-value = 7.65\n",
"P-value: 9.137321189738925e-12\n"
]
}
],
"source": [
"from scipy.stats import ttest_ind\n",
"\n",
"tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Second_Baseman',['Height']],equal_var=False)\n",
"print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The two values returned by the `ttest_ind` function are:\n",
"* p-value can be considered as the probability of two distributions having the same mean. In our case, it is very low, meaning that there is strong evidence supporting that first basemen are taller.\n",
"* t-value is the intermediate value of normalized mean difference that is used in the t-test, and it is compared against a threshold value for a given confidence value."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simulating a Normal Distribution with the Central Limit Theorem\n",
"\n",
"The pseudo-random generator in Python is designed to give us a uniform distribution. If we want to create a generator for normal distribution, we can use the central limit theorem. To get a normally distributed value we will just compute a mean of a uniform-generated sample."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAGoCAYAAABbtxOxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAARLElEQVR4nO3df4zkd13H8ddblgbkR4DcghU4Fgghlj/4kbOIGFNDMEiNQIIJJGI1mFMjBJREL/yh/FnjryZGMRWQGn6FQPkRriqkkqCJEq9QQpuCIFQsXLg2KKAxIS0f/9g5eLfdc7fznd3v7O3jkUxu5rszO+/93Ox+n/e9mZ0aYwQAANj2A3MPAAAA60QgAwBAI5ABAKARyAAA0AhkAABoNg7yzo4dOza2trYO8i4BAGBHN910011jjM37bj/QQN7a2sqZM2cO8i4BAGBHVfXvO233FAsAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGg25h4AgAdm69TpuUeYxe1XXzn3CMAR4QgyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCg2TWQq+qJVfXxqrqtqm6tqtcttj+mqj5WVV9Y/Pno/R8XAAD2116OIN+d5A1jjB9J8mNJfqOqLktyKsmNY4ynJblxcRkAAA61XQN5jHF2jPGpxflvJ7ktyeOTvCTJdYurXZfkpfs0IwAAHJgH9BzkqtpK8uwkn0zyuDHG2WQ7opM89gK3OVlVZ6rqzJ133jlxXAAA2F97DuSqeniS9yd5/RjjW3u93Rjj2jHGiTHGic3NzWVmBACAA7OnQK6qB2c7jt85xrh+sfnrVXXp4uOXJjm3PyMCAMDB2ctvsagkb01y2xjjj9uHPpzkqsX5q5J8aPXjAQDAwdrYw3Wen+RVST5bVTcvtr0xydVJ3ltVr07ylSQ/vy8TAgDAAdo1kMcY/5ikLvDhF6x2HAAAmJd30gMAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoNuYeAGCKrVOn5x4BgIuMI8gAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBm10CuqrdV1bmquqVte1NVfbWqbl6cXry/YwIAwMHYyxHktyd50Q7b/2SM8azF6YbVjgUAAPPYNZDHGJ9I8o0DmAUAAGa3MeG2r6mqX0xyJskbxhj/udOVqupkkpNJcvz48Ql3BwBHz9ap03OPcOBuv/rKuUfgiFv2RXpvTvLUJM9KcjbJH13oimOMa8cYJ8YYJzY3N5e8OwAAOBhLBfIY4+tjjHvGGN9N8pdJLl/tWAAAMI+lArmqLm0XX5bklgtdFwAADpNdn4NcVe9OckWSY1V1R5LfS3JFVT0ryUhye5Jf3b8RAQDg4OwayGOMV+6w+a37MAsAAMzOO+kBAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQbMw9AADsxdap03OPABwRjiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAEAjkAEAoBHIAADQCGQAAGgEMgAANAIZAAAagQwAAI1ABgCARiADAECzMfcAwGpsnTo99wgAcFFwBBkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANDsGshV9baqOldVt7Rtj6mqj1XVFxZ/Pnp/xwQAgIOxlyPIb0/yovtsO5XkxjHG05LcuLgMAACH3q6BPMb4RJJv3GfzS5Jctzh/XZKXrnYsAACYx8aSt3vcGONskowxzlbVYy90xao6meRkkhw/fnzJuwMAjoqtU6fnHmEWt1995dwjsLDvL9IbY1w7xjgxxjixubm533cHAACTLBvIX6+qS5Nk8ee51Y0EAADzWTaQP5zkqsX5q5J8aDXjAADAvPbya97eneSfkjy9qu6oqlcnuTrJC6vqC0leuLgMAACH3q4v0htjvPICH3rBimcBAIDZeSc9AABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAADNxpQbV9XtSb6d5J4kd48xTqxiKAAAmMukQF74qTHGXSv4PAAAMDtPsQAAgGZqII8kH62qm6rq5CoGAgCAOU19isXzxxhfq6rHJvlYVX1ujPGJfoVFOJ9MkuPHj0+8OwCAi9PWqdNzjzCL26++cu4R7mfSEeQxxtcWf55L8oEkl+9wnWvHGCfGGCc2Nzen3B0AAOy7pQO5qh5WVY84fz7JTye5ZVWDAQDAHKY8xeJxST5QVec/z7vGGH+7kqkAAGAmSwfyGONLSZ65wlkAAGB2fs0bAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQLMx9wCwalunTs89AgBwiDmCDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0AhkAABqBDAAAzcbcAxyUrVOn5x4BAIBDwBFkAABoBDIAADQCGQAAGoEMAACNQAYAgEYgAwBAI5ABAKARyAAA0AhkAABoBDIAADQCGQAAGoEMAACNQAYAgGZSIFfVi6rq81X1xao6taqhAABgLksHclU9KMmfJfmZJJcleWVVXbaqwQAAYA5TjiBfnuSLY4wvjTG+k+Q9SV6ymrEAAGAeGxNu+/gk/9Eu35Hkufe9UlWdTHJycfG/q+rzE+7zMDuW5K65h7gIWMfVsZarYy1XwzqujrVcHWu5Ghdcx/r9A57k3p6008YpgVw7bBv32zDGtUmunXA/F4WqOjPGODH3HIeddVwda7k61nI1rOPqWMvVsZarcdjWccpTLO5I8sR2+QlJvjZtHAAAmNeUQP6XJE+rqidX1SVJXpHkw6sZCwAA5rH0UyzGGHdX1WuS/F2SByV52xjj1pVNdvE58k8zWRHruDrWcnWs5WpYx9WxlqtjLVfjUK1jjXG/pw0DAMCR5Z30AACgEcgAANAI5In2+nbbVfWjVXVPVb18cfmJVfXxqrqtqm6tqtcd3NTradm1bNsfVFWfrqqP7P+062vKOlbVo6rqfVX1ucVj83kHM/V6mriWv7n43r6lqt5dVQ85mKnX025rWVVXVNU3q+rmxel393rbo2TZdbTPub8pj8nFx+1zFiZ+f6/nfmeM4bTkKdsvTvy3JE9JckmSzyS57ALX+/skNyR5+WLbpUmeszj/iCT/utNtj8ppylq2j/1Wkncl+cjcX89hXcck1yX5lcX5S5I8au6v6TCuZbbfSOnLSR66uPzeJL8099e0zmuZ5Iqdvnf3+vdwFE4T19E+Z0Vr2T5+5Pc5q1jLdd3vOII8zV7fbvu1Sd6f5Nz5DWOMs2OMTy3OfzvJbdneqR5VS69lklTVE5JcmeQt+z3omlt6HavqkUl+Mslbk2SM8Z0xxn/t+8Tra9JjMtu/JeihVbWR5AdztH9P/F7XctW3vdgsvRb2Ofcz6XFln3MvS6/lOu93BPI0O73d9r1+4FTV45O8LMlfXOiTVNVWkmcn+eTqRzw0pq7lNUl+O8l392m+w2LKOj4lyZ1J/mrx34ZvqaqH7eewa27ptRxjfDXJHyb5SpKzSb45xvjovk673nZdy4XnVdVnqupvquoZD/C2R8GUdfwe+5wk09fymtjnnDdlLdd2vyOQp9nL221fk+R3xhj37PgJqh6e7aNPrx9jfGu14x0qS69lVf1sknNjjJv2abbDZMpjciPJc5K8eYzx7CT/k+QoP99zymPy0dk+gvLkJD+c5GFV9Qv7MeQhsZe1/FSSJ40xnpnkT5N88AHc9qiYso7bn8A+57yl19I+536mPC7Xdr+z9BuFkGRvb7d9Isl7qipJjiV5cVXdPcb4YFU9ONs/qN45xrj+IAZeY0uvZZLnJvm5qnpxkockeWRVvWOMcRSDZMo6/nOSO8YY548qvS9r8oNqJlPW8sFJvjzGuDNJqur6JD+e5B37PfSa2nUte6yNMW6oqj+vqmN7ue0RsvQ6jjHuss+5lymPyefHPqeb+v29nvuduZ8EfZhP2f4HxpeyfZTo/BPTn/H/XP/t+f6LeCrJXye5Zu6vYx1OU9byPtuvyBF+wcTUdUzyD0mevjj/piR/MPfXdBjXMtv/aLs12889rmy/COW1c39N67yWSX4o33/zqsuz/fSUeqB/DxfzaeI62uesaC3vc50jvc9ZxVqu637HEeQJxgXebruqfm3x8Qs+7zjb/wJ9VZLPVtXNi21vHGPcsJ8zr6uJa8nCCtbxtUneWVWXZPsH3i/v68BrbMpajjE+WVXvy/Z/K96d5NM5ZG+zukp7XMuXJ/n1xRH4/03yirG9x9zxtrN8ITObso5V9ROxz/meiY9JmhWs5Vrud7zVNAAANF6kBwAAjUAGAIBGIAMAQCOQAQCgEcgAANAIZAAAaAQyAAA0/wceFVFs3MY9ywAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def normal_random(sample_size=100):\n",
" sample = [random.uniform(0,1) for _ in range(sample_size) ]\n",
" return sum(sample)/sample_size\n",
"\n",
"sample = [normal_random() for _ in range(100)]\n",
"plt.figure(figsize=(10,6))\n",
"plt.hist(sample)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Correlation and Evil Baseball Corp\n",
"\n",
"Correlation allows us to find relations between data sequences. In our toy example, let's pretend there is an evil baseball corporation that pays its players according to their height - the taller the player is, the more money he/she gets. Suppose there is a base salary of $1000, and an additional bonus from $0 to $100, depending on height. We will take the real players from MLB, and compute their imaginary salaries:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(74, 1075.2469071629068), (74, 1075.2469071629068), (72, 1053.7477908306478), (72, 1053.7477908306478), (73, 1064.4973489967772), (69, 1021.4991163322591), (69, 1021.4991163322591), (71, 1042.9982326645181), (76, 1096.746023495166), (71, 1042.9982326645181)]\n"
]
}
],
"source": [
"heights = df['Height']\n",
"salaries = 1000+(heights-heights.min())/(heights.max()-heights.mean())*100\n",
"print(list(zip(heights, salaries))[:10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now compute covariance and correlation of those sequences. `np.cov` will give us a so-called **covariance matrix**, which is an extension of covariance to multiple variables. The element $M_{ij}$ of the covariance matrix $M$ is a correlation between input variables $X_i$ and $X_j$, and diagonal values $M_{ii}$ is the variance of $X_{i}$. Similarly, `np.corrcoef` will give us the **correlation matrix**."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Covariance matrix:\n",
"[[ 5.31679808 57.15323023]\n",
" [ 57.15323023 614.37197275]]\n",
"Covariance = 57.153230230544736\n",
"Correlation = 1.0\n"
]
}
],
"source": [
"print(f\"Covariance matrix:\\n{np.cov(heights, salaries)}\")\n",
"print(f\"Covariance = {np.cov(heights, salaries)[0,1]}\")\n",
"print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A correlation equal to 1 means that there is a strong **linear relation** between two variables. We can visually see the linear relation by plotting one value against the other:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6))\n",
"plt.scatter(heights,salaries)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see what happens if the relation is not linear. Suppose that our corporation decided to hide the obvious linear dependency between heights and salaries, and introduced some non-linearity into the formula, such as `sin`:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Correlation = 0.9835304456670837\n"
]
}
],
"source": [
"salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100\n",
"print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, the correlation is slightly smaller, but it is still quite high. Now, to make the relation even less obvious, we might want to add some extra randomness by adding some random variable to the salary. Let's see what happens:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Correlation = 0.9363097848296155\n"
]
}
],
"source": [
"salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100+np.random.random(size=len(heights))*20-10\n",
"print(f\"Correlation = {np.corrcoef(heights, salaries)[0,1]}\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6))\n",
"plt.scatter(heights, salaries)\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Can you guess why the dots line up into vertical lines like this?\n",
"\n",
"We have observed the correlation between an artificially engineered concept like salary and the observed variable *height*. Let's also see if the two observed variables, such as height and weight, correlate too:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., nan],\n",
" [nan, nan]])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.corrcoef(df['Height'],df['Weight'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unfortunately, we did not get any results - only some strange `nan` values. This is due to the fact that some of the values in our series are undefined, represented as `nan`, which causes the result of the operation to be undefined as well. By looking at the matrix we can see that `Weight` is the problematic column, because self-correlation between `Height` values has been computed.\n",
"\n",
"> This example shows the importance of **data preparation** and **cleaning**. Without proper data we cannot compute anything.\n",
"\n",
"Let's use `fillna` method to fill the missing values, and compute the correlation: "
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.52959196],\n",
" [0.52959196, 1. ]])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.corrcoef(df['Height'],df['Weight'].fillna(method='pad'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is indeed a correlation, but not such a strong one as in our artificial example. Indeed, if we look at the scatter plot of one value against the other, the relation would be much less obvious:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAAGoCAYAAABbtxOxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAABCr0lEQVR4nO3df3Td5XXn+8+2kEEQiKAxpBZ27XgcpTBOcavEppreUjpeYqA3aPk2Db7QlZnmktUObeqQqLWLV7NyF1x76lzSzGp714Um03TsOiGJR82MIa47Dp2Jr20qYhI1EA9QiI2cAq1jYIhijLzvH+cc+fz6SufYPt9nH533ay0tpK0f3jzne77aes7z7MfcXQAAAAAK5qVOAAAAAIiEAhkAAAAoQ4EMAAAAlKFABgAAAMpQIAMAAABlLkidwLl429ve5kuWLEmdBgAAANrQ448//o/uvqA63tYF8pIlSzQ2NpY6DQAAALQhM/tevThLLAAAAIAyFMgAAABAGQpkAAAAoAwFMgAAAFCGAhkAAAAoQ4EMAAAAlKFABgAAAMpQIAMAAABlKJABAACAMhTIAAAAQBkKZAAAAKAMBTIAAABQhgIZAAAAKHNB6gQAAEB8o4cmtHX3YR07MamFvT0aGerX8Mq+1GkBLUGBDAAAZjR6aEIbd45r8tSUJGnixKQ27hyXJIpkzEkssQAAADPauvvwdHFcMnlqSlt3H06UEdBaFMgAAGBGx05MNhUH2h0FMgAAmNHC3p6m4kC7o0AGAAAzGhnqV093V0Wsp7tLI0P9iTICWotNegAAYEaljXh0sUCnoEAGAACzGl7ZR0GMjsESCwAAAKAMBTIAAABQhgIZAAAAKEOBDAAAAJShQAYAAADKUCADAAAAZSiQAQAAgDIUyAAAAEAZCmQAAACgDAUyAAAAUIYCGQAAAChDgQwAAACUoUAGAAAAylAgAwAAAGUokAEAAIAyFMgAAABAmZYVyGa2yMy+bmZPmdl3zOy3i/HrzOyAmT1hZmNm9t6y79loZs+Y2WEzG2pVbgAAAECWC1r4s9+U9DF3/6aZXSrpcTPbI+kPJH3S3R8xs5uLH99gZtdIuk3StZIWSvprM3unu0+1MEcAAACgQstmkN39++7+zeL7r0l6SlKfJJd0WfHL3irpWPH9WyV9wd1Puvtzkp6R9F4BAAAAOWrlDPI0M1siaaWkg5LWS9ptZp9SoUD/2eKX9Uk6UPZtLxRj1T/rw5I+LEmLFy9uWc4AAADoTC3fpGdmb5H0FUnr3f1VSb8h6aPuvkjSRyV9tvSldb7dawLuD7j7gLsPLFiwoFVpAwAAoEO1dAbZzLpVKI63u/vOYviDkn67+P6XJP1p8f0XJC0q+/ardWb5BQCgQaOHJrR192EdOzGphb09Ghnq1/DKmhfkAAAZWtnFwlSYHX7K3e8v+9QxST9ffP9GSU8X3/+qpNvM7EIzWyppuaTHWpUfAMxFo4cmtHHnuCZOTMolTZyY1Mad4xo9NJE6NQBoG62cQR6U9KuSxs3siWLs9yTdKekzZnaBpB+puJ7Y3b9jZg9JelKFDhh30cECAJqzdfdhTZ6qvHVOnprS1t2HmUUGgAa1rEB292+o/rpiSfqZjO+5T9J9rcoJAOa6Yycmm4oDAGpxkh4AzCELe3uaigMAalEgA8AcMjLUr57uropYT3eXRob6E2UEAO0nlz7IAIB8lNYZ08UCAM4eBTIAzDHDK/soiAHgHFAgAwA6Fj2jAdRDgQwA6EilntGltnilntGSKJKBDscmPQBAR5qpZzSAzkaBDADoSPSMBpCFAhkA0JHoGQ0gCwUyAKAj0TMaQBY26QEAOhI9owFkoUAGAHQsekYDqIclFgAAAEAZCmQAAACgDAUyAAAAUIYCGQAAACjDJj0AmGNGD03QmQEAzgEFMgDMIaOHJrRx5/j0EcoTJya1cee4JFEkA0CDWGIBAHPI1t2Hp4vjkslTU9q6+3CijACg/VAgA8AccuzEZFNxAEAtCmQAmEMW9vY0FQcA1KJABoA5ZGSoXz3dXRWxnu4ujQz1J8oIANoPm/QAYA4pbcSjiwUAnD0KZACYY4ZX9lEQA8A5YIkFAAAAUIYCGQAAAChDgQwAAACUoUAGAAAAylAgAwAAAGUokAEAAIAyFMgAAABAGQpkAAAAoAwFMgAAAFCGAhkAAAAoQ4EMAAAAlKFABgAAAMpQIAMAAABlKJABAACAMhTIAAAAQBkKZAAAAKBMywpkM1tkZl83s6fM7Dtm9ttln/stMztcjP9BWXyjmT1T/NxQq3IDAAAAslzQwp/9pqSPufs3zexSSY+b2R5JV0m6VdK73f2kmV0pSWZ2jaTbJF0raaGkvzazd7r7VAtzBFDH6KEJbd19WMdOTGphb49Ghvo1vLIvdVpoc1xX6BRc642LOlYtK5Dd/fuSvl98/zUze0pSn6Q7JW1x95PFz71U/JZbJX2hGH/OzJ6R9F5J+1uVI4Bao4cmtHHnuCZPFf42nTgxqY07xyUpxE0L7YnrCp2Ca71xkccqlzXIZrZE0kpJByW9U9LPmdlBM/sbM3tP8cv6JB0t+7YXijEAOdq6+/D0zapk8tSUtu4+nCgjzAVcV+gUXOuNizxWrVxiIUkys7dI+oqk9e7+qpldIOlySaslvUfSQ2b2DklW59u9zs/7sKQPS9LixYtbljfQqY6dmGwqDjSC6wqdgmu9cZHHqqUzyGbWrUJxvN3ddxbDL0ja6QWPSTot6W3F+KKyb79a0rHqn+nuD7j7gLsPLFiwoJXpAx1pYW9PU3GgEVxX6BRc642LPFat7GJhkj4r6Sl3v7/sU6OSbix+zTslzZf0j5K+Kuk2M7vQzJZKWi7psVblB6C+kaF+9XR3VcR6urs0MtSfKCPMBVxX6BRc642LPFatXGIxKOlXJY2b2RPF2O9J+pykz5nZ30l6Q9IH3d0lfcfMHpL0pAodMO6igwWQv9LGiIi7itG+uK7QKbjWGxd5rKxQm7angYEBHxsbS50GAKABUds5AehcZva4uw9Ux1u+SQ8AgMjtnACgGkdNAwBaLnI7JwCoRoEMAGi5yO2cAKAaBTIAoOUit3MCgGoUyACAlovazmn00IQGt+zV0g27NLhlr0YPTSTNB0AMbNIDALRcxHZObBwEkIUZZABAR2LjIIAszCADAFou4mwtGwcBZGEGGQDQchFna9k4CCALBTIAoOUiztZG3TgIID0KZABAy0WcrR1e2afNa1eor7dHJqmvt0eb165ggx4A1iADAFpvZKi/Yg2yFGO2dnhlHwUxgBoUyACAlovY5g0AslAgAwBywWwtgHbBGmQAAACgDAUyAAAAUIYlFkBio4cmWJfZIMaqvd3+4H7te/b49MeDy67Q9juvT5hRzJzQOO4JjWOsmsMMMpBQ6XSxiROTcp05XWz00ETq1MJhrNpbdSEqSfuePa7bH9yfKKOYOaFx3BMax1g1jwIZSCji6WJRMVbtrboQnS2eh4g5oXHcExrHWDWPJRZAQhFPF4uKsUIn4eXw2XFPaBxj1TxmkIGEIp4uFhVjhU7By+GN4Z7QOMaqeRTIQEIjQ/3q6e6qiEU4XSwixqq9DS67oql4HiLmJPFyeKO4JzSOsWoeBTKQ0PDKPm1eu0J9vT0ySX29Pdq8dgUvpdbBWLW37XdeX1N4pu4YETEniZfDG8U9oXGMVfPM3VPncNYGBgZ8bGwsdRoAAJw3g1v2aqJOMdzX26N9G25MkBEwd5nZ4+4+UB1nBhkAgEB4ORxIjy4WAAAEUnrZmy4WQDoUyAAwx9AirP0Nr+zjMQMSokAGgDmk1CKs1AWh1CJMEgUXADSINcgAMIfQIgwAzh0zyABwDjaNjmvHwaOacleXmdatWqR7h1cky4cWYc2J9vhFxtIddBIKZAA4S5tGx7XtwJHpj6fcpz9OVWTNv2CeTr55um4clSI+flGxdAedhjsmgBqjhyY0uGWvlm7YpcEtezniNsOOg0ebiuehXnE8UzxP0a6riI9fVCzdQadhBhlABWaKGjeVcdBSVryTRbyuePwax9IddBpmkAFUYKYIrRDxuuoyayreyRb29jQVB9odBTKACswUtbfBZVc0Fc9LxOtq3apFTcU7Gaf7odNQIAOowExR4/oyxiQrnoftd15fUwwPLrtC2++8PlFGBRGvq3uHV+iO1YunZ4y7zHTH6sVs0KtjeGWfNq9dob7eHpkK1/jmtStYdoU5y7yN11oNDAz42NhY6jSAOaV6rahUmCnil2EtxqpxjBWAiMzscXcfqI6zSQ9AhVKxQr/T2TFWjWOsALQTZpABAB2Lwy+AzsYMMgAAZSK2ngMQA5v0AAAdKWLrOQAxtGwG2cwWSfpzSW+XdFrSA+7+mbLPf1zSVkkL3P0fi7GNkj4kaUrSR9x9d6vyA9B+No2Oa8fBo5pyV5eZ1q1alLzjQMSX6G9/cL/2PXt8+uMIXSwk6d2f+JpePXmmIL3swi59+5M3JcsnYuu5kojXOjmhk7RyBvlNSR9z95+UtFrSXWZ2jTRdPK+RdKT0xcXP3SbpWkk3SfoTM+uq+akAOtKm0XFtO3Bk+pSzKXdtO3BEm0bHk+VUeol+4sSkXGdeok95hHJ1cSxJ+549rtsf3J8oo4Lq4liSXj05pXd/4muJMpIu6q7/KzArnpeI1zo5odO07C7g7t93928W339N0lOSStMqn5b0O5LKdwjeKukL7n7S3Z+T9Iyk97YqPwDtZcfBo03F8xDxJfrq4ni2eF6qi+PZ4nk4+ebppuJ5iXitkxM6TS5/JpvZEkkrJR00s/dJmnD3b1V9WZ+k8qv6BZ0pqMt/1ofNbMzMxl5++eVWpQwgmKmMjjtZ8TxEfokeszudcelkxfMS8VonJ3SalhfIZvYWSV+RtF6FZRf3SPr9el9aJ1Zzlbv7A+4+4O4DCxYsOJ+pAgisdNpZo/E8RDwdDo2LeE3N9O+nzIuc0GlaWiCbWbcKxfF2d98paZmkpZK+ZWbPS7pa0jfN7O0qzBgvKvv2qyUda2V+ANrHulWLmornYWSoXz3dlVslerq7NDLUnygj1RwzPVu8k0W8pmb691PmRU7oNC0rkM3MJH1W0lPufr8kufu4u1/p7kvcfYkKRfFPu/s/SPqqpNvM7EIzWyppuaTHWpUfgPZy7/AK3bF68fTsUJeZ7li9OOmO9eGVfdq8doX6entkkvp6e5Ifnfz+gcWaVzWBNs8K8ZT6MmbVs+J5iHhNRc2LnNBpWnaSnpn9C0n/XdK4Cm3eJOn33P3hsq95XtJAWZu3eyT9mgpLMda7+yMz/RucpAcAlQa37NVEnTXQfb092rfhxgQZFYwemtDIl7+lU1Nnfud0d5m2/vJPJW+Lh/YVsc0i2kvuJ+m5+zdUf11x+dcsqfr4Pkn3tSonAJjrQm8crJ6PYS8VzgEnIaKVOEkPAOaQqBsHt+4+rFNV7SFOnXZOrcNZi9hmEXMHBTIAzCERNw5KwWe20Za4ptBKFMgAMIdE3DgoxZ3ZRvvimkIrtWwNMgB0gk2j49px8Kim3NVlpnWrFiXfRT/2veP6h1d+JJf0D6/8SGPfO568QB4Z6tf6Lz5RN55S9dHcg8uu0PY7r0+YUUHE6yqakaH+ijXIUoxXSzA3MIMMAGdp0+i4th04Mn1y15S7th04ok2j4+RU5Y+//nRT8TxUF8dS4Uju2x/cnyijgqiPYTRRXy3B3ECBDABnacfBo03F8xAxJ0l6+qXXm4rnobo4ni2el6iPYUTDK/u0b8ONem7LLdq34UaKY5w3FMgAcJamMvrIZ8XzEDEnNIfHEEiPAhkAzpJldHrPiuehK+Mfz4ojHh5DID0KZAA4Sz0X1L+FZsXzsG7VoqbieVl+5SVNxfNwYcbjlBXPS9THEOgkFMgAcJZ+eOp0U/E83Du8QnesXjw929hlpjtWL07eAWHP3TfUFMPLr7xEe+6+IU1Ckt54s/7jlBXPS9THEOgktHkDgLPUZVZ3XWjql8LvHV4RsphKWQzXs7C3RxN1DpWI0Ec36mMIdAoKZABtY/TQhLbuPqxjJya1sLdHI0P9SXetR91MFW2coqKPLoAsFMgA2sLooYmKYmbixKQ27iz0hU1V/PVlzED2JZyBjDhOUZXGgz8mAFSjQAbQFrbuPlwx0ydJk6emtHX34WQFTcQZyIjjVBLxdLjhlX3JxwXIA68sNYcCGUBbqDdTO1M8DxFnII9ljEdWPC+l0+FKSqfDSUpeJANzHa8sNY8CGUCNiDMNUTfE/f7ouF49eeaXzu+Pjicdq6gbz2Y6HS5lgbzm/kcrTvNL3VmjZNV9e/Tia29Mf3zVpfN18J41CTOKeV+IKtpYRX5lKSravAGoUJppmDgxKdeZmYbRQxNJ84q4Ie7dn/jadHFc8urJKb37E19LlJH04iv1Z4qz4nmJ+PhVF8dS4ejrNfc/miahouriWJJefO0NrbpvT6KM4t4XIoo4VlFfWYqMAhlAhZlmGlLK2viWckNcdXE8WzwPb2bUm1nxTlZdHM8Wz0t1cTxbPA9R7wsRRRyrrFeQUr+yFBkFMoAKUWcaRob61dPdVRFLvSEO6BRR7wsRRRwr7p/No0AGUKH34u6m4nkZXtmnzWtXqK+3R6bCzPHmtStYPwfkgBnIxkUcK+6fzWOTHoAKWUtCE599ISleS67LLuyqu5zisgu76nx1Pi7qMv1oqvbBuqgr7WbGiJZfeUnd5RTVR2Ln7apL59ddTnHVpfMTZFMQsaVhVFHHKtr9MzpmkAFUeGXyVFPxTva+jF82WfE8bPnln2oq3sn23H1DTTEcoYvFxpuvaSqeB2YgG8dYzQ3MIAOoELVNWEQRW5dlbQRK3c4papu+1MVwPVEfQ2YgG8dYtT9mkAFUYDNH4yK2Lou4QUiKOVZRRX0MgU5CgQygAi8PNm5exuRnVjwPETcISTHb9EUV9TEEOglLLNAxop1sFBkvDzbmwgvmafLU6brxVEaG+jXy5W/pVNlGve4uS/4KQNSNSxExVkB6FMjoCJxDj1b4UZ3ieKZ4bqpXLQRYxVB6nvFH6uwYKyA9CmR0BM6hRytE3NC4dfdhnTpdWRGfOu0hrnVemWgcYwWk1VCBbGb/zt1/d7YYEFXkTS8Rl35EzEmS1tz/aEXf2tQtuUaG+rX+i0/UjafCtd6c2x/cr33PHp/+eHDZFdp+5/UJMyqIOFZAJ2l0odyaOrF/dT4TAVop6qaX0tKPiROTcp1Z+jF6aIKcqlQXx5L09Euva839j6ZJSNIff/3ppuJ5yFpNkXqVRcTrqro4lqR9zx7X7Q/uT5RRQcSxAjrNjAWymf2GmY1L6jezb5e9PSfp2/mkCJy7qK3LZlr6kUrEnCTVPfFspngeIuYUVcTrqro4ni2el4hjBXSa2ZZY/IWkRyRtlrShLP6au6e9gwBNiLrppd761ZnieYj8Ej3aV8RrPSqeg0B6MxbI7v6KpFckrTOzLklXFb/nLWb2Fnc/kkOOwHkRcdNLxNPFIm48Q/uLeK1HxXMQSK+hNchm9puSXpS0R9Ku4tt/aWFeQEeIeLrYyFC/uqtOuuiel76P7vIrL2kqnoeIOUUV8VofXHZFU/G8RH0OAp2k0U166yX1u/u17r6i+PbuFuYFdISwp4tVT+oFmOTbc/cNNYVn6i4WEXOKqqe7/q+brHgett95fU0xHKWLRcTnINBJGr0zHVVhqQWA8yji5sGtuw9XnMImSaemPMQGoT1336Dnt9wy/RahEL3rF5ZXHMt91y8sT5pP1FnRk2/WPzwlK56X9w8srnj83j+wOGk+UuznINApZlyDbGZ3F9/9e0mPmtkuSSdLn3f3+1uYGzDnRdw8yAahxkU8oXHpgrfU7cKwdMFbEmRzxumMlRRZ8TxEfPwknoNABLN1sbi0+N8jxbf5xTcA50m0zYNsEGpcxBMadxw8mhm/d3hFztnEFvHxk3gOAhHM1sXik3klAnSqaCdmjQz1V8yqSemXfUQVcaYv4ma4qCI+fhLPQSCCRo+a/s+qPYjpFUljkv5fd//R+U4M6AQRX+KNuOwjqogzfVHbqV1+cbd+8MNTdeOpRHz8JJ6DQASNbtL7e0n/U9KDxbdXVWj79s7ixzXMbJGZfd3MnjKz75jZbxfjW83su8UT+f6TmfWWfc9GM3vGzA6b2dA5/H8BNUYPTWhwy14t3bBLg1v2hji2NeqJWZsffrLimNvNDz+ZNJ+Sd93zsJZs2DX99q57Hk6aT9aMXsqZvnWrFjUVz0vWBHbKie2Ij1/Jl8aOVDwHvzTGsQP1RLyvY25otEBe6e7/u7v/5+LbHZLe6+53SfrpjO95U9LH3P0nJa2WdJeZXaNCL+V/XmwT9z8kbZSk4uduk3StpJsk/UnxcBLgnJVmast/4WzcOZ78ZhrxdLFV9+3Ri6+9URF78bU3tOq+PYkyKnjXPQ/rR1U7+3805UmL5PVffKKpeB62HahfSGXF83Jisnb2eKZ4Hj6a8ThlxfNy+4P7azZa7nv2uG5/cH+ijGKKel/H3NBogbzAzKZ73xTff1vxwzfqfYO7f9/dv1l8/zVJT0nqc/e/cvc3i192QNLVxfdvlfQFdz/p7s9JekbSe5v6vwEyRJ2pjai6OJ4tnpfq4ni2ODCbrCsn9RVVrwvJTPFOxX0drdTQGmRJH5P0DTN7VoV25Usl/Vszu0TS52f7ZjNbImmlpINVn/o1SV8svt+nQsFc8kIxVv2zPizpw5K0eHH6fpVoD1E34wAAzg73dbRSQwWyuz9sZsslvUuFAvm7ZRvz/nCm7zWzt0j6iqT17v5qWfweFZZhbC+F6v3TdXJ5QNIDkjQwMJD6D320iaibcQAAZ4f7OlppxiUWZnZj8b9rJd0iaZmkd0i6uRibkZl1q1Acb3f3nWXxD0r6JUm3u09v0XhBUvkukqslHWv8fwWRRNs4EfHEOinmqWdXXVq/1XlWHGhXWX09Up/qnHX6dsJTuUOKel/H3DDb0+3ni//9X+u8/dJM32hmJumzkp4qP3HPzG6S9LuS3ufuPyz7lq9Kus3MLjSzpZKWS3qsif8XBBFx48Twyj5tXrui4kjZzWtXJG+btP3O62uK4cFlV2j7ndcnykg6eM+ammL4qkvn6+A9axJlhLkgYjH63JZbav59K8ZTyjp9O/Gp3OFEva9jbpjtoJBPFP/7b87iZw9K+lVJ42b2RDH2e5L+vaQLJe0p1NA64O6/7u7fMbOHJD2pwtKLu9x9qvbHIrqop1NFO7GuJGUxnCViMRy1v280Uccp6svhqYvheqKOVURR7+tofw29YGNmV5nZZ83skeLH15jZh2b6Hnf/hrubu7/b3a8rvj3s7v/M3ReVxX697Hvuc/dl7t7v7o+c2/8aUmHjBFoh4glxEZfIrH7H5U3F88LL4Y1jrID0Gl3R9GeSdktaWPz4f0ha34J8MAdkzXIw+4Fz0Zdx/WTF8xBxiczz/1T/D9GseF54ObxxjBWQXqNt3t7m7g+Z2UZJcvc3zYzlD6hrZKi/4vhkidkPnLuo11W0JTKRX8Hh5fDGMVZAWo0WyK+b2Y+p2HbNzFZLeqVlWaGtlW7qW3cf1rETk1rY26ORoX5u9jgnXFeNuah7niZP1e7muogWCADQsBkLZDNbL2mfpN+R9JeS3mFm+yQtkPT+lmeHtsXsR3urPtb5oi7Td++7OWFGBeVHOE+cmNT6Lz6R/DqrPpo7dcePkxmtDrLieVpz/6N6+qXXpz9efuUl2nP3DekSkrRpdFw7Dh7VlLu6zLRu1SLdO7wiaU5RjR6a4A9UdIzZphSulvQZSV8rfu0eSX8h6Wfd/Vstzg1AAtXFsVQ4zvld9zycKKOCJRt2NRXPQ3VxLBWO5F51355EGUmnM/YsZsXzUl0cS9LTL72uNfc/miYhFYrjbQeOTG/0nHLXtgNHtGl0PFlOUUVs3wm00owFsrt/3N1/VtLbJX1chaOib5T0bTN7Mof8AOSsujieLd7Jqovj2eKdrLo4ni2ehx0HjzYV72Qzte8E5qJG1yD3SLpM0luLb8ck8Sc2AKBtRWwdGFXkzZ9AK8y2BvkBSddKek2F2eP/T9L97v6DHHIDAKBloh6qEhGHl6DTzLYGebEKp979g6QJSS9IOtHinAAkdFFX/eIgK97Jqo/kni2eh4hHOkvZv2xS9tZYt2pRU/FOxuEl6DSzrUG+SdJ7JH2qGPqYpL81s78ys0+2OjkA+dvyyz/VVDwvz2ccCZwVz8PBe9bUFMOpu1g8t+WWmmLYlP5I5aweGil7a9w7vEJ3rF48PWPcZaY7Vi+mi0UdHF6CTmPe4ForM7ta0qCkn5X0S5J+zN17W5fa7AYGBnxsbCxlCsCcM7hlb92XUvt6e7Rvw40JMsJcsGzjw5nLGZ7dnL6FIIDOZGaPu/tAdXy2NcgfUaEgHpR0SoWeyPslfU5s0gPmpMibcSL2rI3YGzZiTmyIA9BOZutisUTSlyV91N2/3/p0AKQWdTNOqWdtSalnraRkRXKpN2yp/VWpN6ykZAVpxJwk6ZL5XXr9jam6cQCIZrY1yHe7+5cpjoHOMTLUr+55latYu+dZ8s04EXvWRuwNGzEnSfphneJ4pjgApJRyAzGAqOrt8kos4kv09WbaZ4rnIeoSmaxHiQUWACKiQAZQYevuwzpVdWreqSlPPgOJxmQthUm9RAYA2kmjJ+kBbS/ixiVJWnP/oxXH7S6/8hLtufuGZPlEnYFEY0aG+rX+i0/UjaNWtOdfyar79lQcWZ66fSDQaZhBRkcobVyaODEp15mNS6OHJpLmVf3LWZKeful1rbn/0TQJiZfC21294nimeCeL+PyTaotjSXrxtTe06r49iTICOg8zyHNAxJnRaDnNtHEpZV7Vv5xniwM4f6I+/6qL49niAM4/CuQ2F7GlU8ScWDYAAAAaxRKLNhexpVPEnNi4BAAAGkWB3OYizoxGzGlkqF893ZUHEvR0d7FxqY4LMlq6ZcXzEjUvNOairvoPVFY8D8uvvKSpeF6uunR+U/G8jB6a0OCWvVq6YZcGt+xNvocDaCUK5DYXcWY0Yk7DK/u0ee0K9fX2yCT19fZo89oVyddqX35xd1PxPHzqV65rKp6XVe+4oql4Hp7fcktT8U723fturimGL+oyffe+mxNlJK16x481Fc/LxpuvaSqeh6gbnYFWYQ1ymxsZ6q9Y7yulnxmNmJNUKJJTF8TVss64SHj2ReZSmNQbGvc9e7ypeF7uWL1YOw4e1ZS7usy0btWipPmY6ncciTDRnrIYrmem0xlTHV8uxXwORt3oDLQKBXKbK92YInWMiJhTVK9MnmoqnoeIS2Si2jQ6rm0Hjkx/POU+/XGqAos2fY2LeDqjFPM5GDEnoJUokOeAiDOjEXOKaGFvT91jiVMvkYmWU1RRZyDRmC6zusVwl6Wdb4/4HIyYE9BKrEEGEoq4eXBkqF9d8yoLhK55lnyJzOCy+muNs+J5iDoDicZkLYdJvUwm6n0hWk5AK1EgAwlF3Dw49r3jmjpdWeBNnXaNfS/tWt+IsuYZU84/9mXM6GXFO9m9wyt0x+rF0zPGXWa6Y/Xi5LP/Ee8LEXMCWoklFugYm0bHazZTpf5FKMVbjhJ12UDETXoR1/uODPXXPVY6wkzf0g27KsbGJD1Hx4+6ot0XJOlLY0eml1lMnJjUl8aOhMsROF+YQUZHKG2mKr30XdpMtWl0PHFm8bBsoL3VK45niuelujiWCn9ILN2wK0U6krgvNOP2B/fX/DG679njuv3B/YkyAlqLAhkdYaZZUQCtF3G2nftC4yK+ggO0EgUyOgKzogCqcV8AkIUCGR0hq21T6nZOANLhvgAgCwUyOkLUdk5Ap4jY8YP7QuMitlkEWokCGR0hajsntLeIRd/zGV0hsuJ5+fQHrmsqnoeBn7ii5pfgvGIclbbfeX1NMTy47Aptv/P6RBkBrUWbN3SMe4dXhCyIRw9NhDqWO+rpYhHzinq6WOpiuJ6tuw9nxlNd71t3H9bpqthppc0pMophdBJmkIGERg9NaOPOcU2cmJSr0Ft0485xjR6aSJZT1I1L71hwcVPxPFw8v/4tNCveyY7V+UNipngeIuYEIAbu4kBCW3cf1uSpqYrY5KmpzNm2PETduPT3L/+wqXgenn7p9abinSxrVj3lbHvEnADEQIEMJBRxBivqDHLUvNCYkaF+9XR3VcR6uruSnvAXMScAMbAGGUio9+Ju/eCHp+rGU7k8I6fLE+YkxVyDjMaV1vRGWm8fMScAMbSsQDazRZL+XNLbVdj38IC7f8bMrpD0RUlLJD0v6Vfc/QfF79ko6UOSpiR9xN13tyo/tFa0jWdRZU1+ppwUjZiTVGi9te3AkbrxVOZ3md6Yqh2Y+V1pi/YldY5vjrBx7+MPPaE3i8M1cWJSH3/oieT3hY9+8Ynp0/wmTkzqo19Mn5NUe7RzhI4R3NfRSVq5xOJNSR9z95+UtFrSXWZ2jaQNkv6ruy+X9F+LH6v4udskXSvpJkl/YmZddX8yQou48SyqE5O1M7UzxfMQMSdJdYvjmeJ5qFcczxTPQ73ieKZ4Xv7Zxl3TxXHJm16Ip7J0w66ao669GE+pujiWCkc63/7g/kQZcV9H52lZgezu33f3bxbff03SU5L6JN0q6fPFL/u8pOHi+7dK+oK7n3T35yQ9I+m9rcoPrRNx4xmAtKqL49niecj6p1Ovaq8ujmeL54H7OjpNLpv0zGyJpJWSDkq6yt2/LxWKaElXFr+sT9LRsm97oRir/lkfNrMxMxt7+eWXW5o3zk7EjWcAgLPHfR2dpuUFspm9RdJXJK1391dn+tI6sZo/5N39AXcfcPeBBQsWnK80cR7ROgkA5hbu6+g0LS2QzaxbheJ4u7vvLIZfNLMfL37+xyW9VIy/IKl8t83Vko61Mj+0RtTWSaOHJjS4Za+WbtilwS17Q6ydi3hUMdAKF2Rc1FnxPER9/lUf6TxbPA9R7+tAq7SsQDYzk/RZSU+5+/1ln/qqpA8W3/+gpL8si99mZhea2VJJyyU91qr80DrDK/u0ee0K9fX2yCT19fZo89oVSXc7R91g8ukPXNdUPA9/mPFvZ8XzEjGviAVWxJwk6VO/cl1T8Tw8t+WWmnGxYjyl7XdeX1MMp+5iEfG+DrRSK/sgD0r6VUnjZvZEMfZ7krZIesjMPiTpiKT3S5K7f8fMHpL0pAodMO5y96man4q2MLyyL9SNc6YNJinzzNrgkjKviDmV/v2seKq8Fvb2aKLOGszUp8NFy0mK+fhJ6YvhLKlbutUT7b4OtFIru1h8w93N3d/t7tcV3x52939y91909+XF/x4v+5773H2Zu/e7+yOtyg2dJ+oGk4h5Rcxppn8/ZV4jQ/3qnlc5B9k9zzgdro6Ijx8AZOGoaXSEqBtMLuqu/xTMiueha179F+Oz4nl5a0/9k/yy4rmp9xp9QlFfCo/6HASAejhqeg7gdKPZjQz1a+PO8YplFhFm1U6+ebqpeB7ePF2/C2xWPC9ZJ0qnPGl66+7DOlV1KMipKU++bCDiS+FRn4MAUA8FcpsrbT4r/dIpbT6TFO4XZEqlsYj2h0RWzZm4Fg3pBz+sf5JfVjwP9db6zhTvZFGfgwBQDwVym4u6+SyiiLNqXWaa8tpquCvltGhQEccqYk6RRXwOAkA9FMhtjo0vjVt13x69+Nob0x9fdel8HbxnTcKMpHWrFmnbgSN146ksv/ISPf3S63XjKdUrRGeK5yFiTpK0ZMOumtjzAbo1RHwOAkA9bNJrc2x8aUz1L2ZJevG1N7Tqvj2JMir48t8ebSqeh6P/9MOm4oilXnE8UzwvUZ+DAFAPBXKbi9rSKZrqX8yzxfPyo6n6M41Z8TxEzAntL+pzMOIJmwDSY4lFm2PjCwCcHTY5A8hCgTwHsPEFAJrHJmcAWVhigY5w1aXzm4rn5aKu+t0OsuKI5YKMhykr3skiPgfZ5AwgCwUyOsLBe9bU/CKOsIP+l99Tv1tFVjwPfRkbPLPieYmY1zObb6kphi+wQjyVrG4VqbtYbLz5mqbieWCTM4AsFMjoGGuufft0f9ouM6259u2JM5J2HKzfrSIrnoeRof6aG8O8YjylkaF+dVcdd909z5Ln9czmW/T8ljNvKYvjkj/8wHUVR03/4QeuS52Stu4+3FQ8D2xyBpCFAhkdYdPouLYdODLdn3bKXdsOHNGm0fGkeUXsozv2veOqPuj6dDGeXPXSBZYy1ChtPJs4MSnXmY1nqbszRDx1cHhlnzavXVHxx8TmtStYfwyAAhmdIeJMbVRRx2rr7sM6VdVq7tSUJ52BjGimjWcpZZ0umPrUweGVfdq34UY9t+UW7dtwI8UxAEkUyOgQEWdqo4o6VmyoakzUcYp6XQFAPRTI6AhRZ68i5hUxJ0l6a093U/FO1Xtx/fHIiucl4iZLAMhCH2S0xOihiVCHl6xbtUjbDhypG08pYl4Rc5KkrPo8cd1e9wjnlB0jsiZkU0/Ujgz1a/0Xn6gbT+n2B/dr37Nn1tcPLrtC2++8PmFGBdHuoVLcsQJagRlknHcRNwl9+W/rr5/NiuelXiE6UzwPEXOSpB/88FRT8TzUK45niufhxGT98ciK5+X3MzbEZsXzUF3wSdK+Z4/r9gf3J8qoIOI9NOpYAa1CgYzzLuImoR9N1Z8+y4oDOL9ePTnVVDwP1QXfbPG8RLyHRh0roFUokHHeRd0kBADtgHsokB4FMs47TqcCgLPHPRRIjwIZ5x2nUwGodtmFXU3F8zC47Iqm4nmJeA+NOlZAq1Ag47yLeDpV1BZTWd0OUnZBuGP14qbinSziWGUdK536uOlvf/KmmmL4sgu79O1P3pQoI2n7ndfXFHgROjNEvIdGHSugVcxT9/45BwMDAz42NpY6DbSB0UMTGvnytypOYuvuMm395Z9K3jopmmUbH657eEOXmZ7dfHOCjApm6gyR6g+KiGM1uGVv3eOb+3p7tG/DjQkyOiNi6zIAnc3MHnf3geo4fZDROarrmPb927ClOPGscRHHKuoGr1LrslJ3hlLrMkkUyQDCYYkFOsLW3Yd16nRl0XLqtCdtm4T2F/HUwagbvCK2LgOALBTI6AhRZ9Wkwsza4Ja9Wrphlwa37E16GEBky6+8pKl4HrJOF0x56mDEDV5S7OcgAFSjQEZH6L24u6l4XiKemNXbkzFWGfG87Ln7hppiePmVl2jP3TekSUjSwE9coa55lbPFXfNMAz+Rbmd/xA1eUtyZbQCohwIZLRFtVjRrSWjqZbURX3bOWh2QcNXAtGdeen3Gj/O2dfdhTVUt3ZkKsHRn88NPVvzRtfnhJ5PmIxVmtrur/pjonmfJZ7aj3asAxECBjPMu4qzoiclTTcXzUq/bwEzxPPzgh/XHJCuel6UbdtXdZ7l0hu4WrRbx8Vt13x69+NobFbEXX3tDq+7bkyijMtV/ZCX+oyvivQpADBTIOO8izoqi/WVN9tNbo1J1cTxbPC9bdx+uaLMoSaem0s62c68CkIUCGecdm3EAVIt4X4iYE4AYKJBx3rEZB0C1iPeFiDkBiIECGeddxDZTV106v6k40IgLMtbQZsXzEPVaj3hfiJgTgBgokHHeRWwzdfCeNTUFwlWXztfBe9Ykyqggq44K0DACDXhm8y01xfAFVoinEvVaj3hfiJgTgBg4ahotMbyyL9wvmdQFQj0Le3vqdjxI+RJvl1ndo5JTng5X+vcj5pWyGM4S8VqXYt4XIuYEID1mkIGEIr7EG/F0OEl1i+OZ4gAAnC1mkIGESjNXW3cf1rETk1rY26ORof6kM1r3Dq+QJO04eFRT7uoy07pVi6bjqVx+cXfdXsyXJz4NEQAw91AgA4lFfIn33uEVyQvialFPQwQAzD0ssQDQFqKehggAmHtaViCb2efM7CUz+7uy2HVmdsDMnjCzMTN7b9nnNprZM2Z22MyGWpUXgPaUtRkv9SY9AMDc08olFn8m6Y8k/XlZ7A8kfdLdHzGzm4sf32Bm10i6TdK1khZK+msze6e7TymQ0UMTodaKRs5rzf2P6umXXp/+ePmVl2jP3TekS0jS0g27Ko4lNknPbUnfgWDJhl01secT5xUxp6ib9CKOVcTnHwC0k5bNILv7f5N0vDos6bLi+2+VdKz4/q2SvuDuJ939OUnPSHqvAhk9NKGNO8c1cWJSLmnixKQ27hzX6KEJ8qpS/ctZkp5+6XWtuf/RNAmptjiWChfj0jrFTZ7qFVczxfMQMaeoIo5VxOcfALSbvNcgr5e01cyOSvqUpI3FeJ+ko2Vf90IxFsbW3Yc1eapyQnvy1JS27j6cKKOCiHlV/3KeLZ6HrDlG9ndhron4/AOAdpN3gfwbkj7q7oskfVTSZ4vxeosI69YuZvbh4vrlsZdffrlFadY6Vucwh5nieYmaFwAAQLvKu0D+oKSdxfe/pDPLKF6QVH4KwdU6s/yigrs/4O4D7j6wYMGCliVaLetks5Qnns3076fOCwAAoF3lXSAfk/TzxfdvlPR08f2vSrrNzC40s6WSlkt6LOfcZhTxxDMpZl7Lr7ykqXgesvoc0P8Ac03E5x8AtJtWtnnbIWm/pH4ze8HMPiTpTkn/t5l9S9L/JenDkuTu35H0kKQnJX1N0l3ROlgMr+zT5rUr1NfbI5PU19ujzWtXJO8WETGvPXffUPPLOPUu+ue23FJTDEfoYvGHH7iuqXgesjowpO7MwFg1JuLzDwDajXkbH0M1MDDgY2NjqdMAztrglr2aqLNevK+3R/s23Jggo4KIrQMZKwDA+WZmj7v7QHWco6aBhCJusiy1Dix1Rym1DpSUtPCrVxzPFM9D1LECAJwbjpoGEoq4yTJi60Ap5kl6UccKAHBumEFGS2waHdeOg0c15a4uM61btUj3Dq9ImlPEl8JHhvorZiCl9JssI85qSzFP0os6VgCAc8MMMs67TaPj2nbgyHThMuWubQeOaNPoeLKcIp44KMXcZNl7cXdT8bz0ZcyqZ8XzEPEVAADAuaNAxnm34+DRpuJ5iPxS+B9//emKwv2Pv/70rN/TSlkTsqn382bNqqecbR8Z6ld3V+USj+4uS97+8fYH92vJhl3Tb7c/uD9pPiWjhyY0uGWvlm7YpcEte5P/gQoAWSiQcd5FfCk84gYvSVpz/6M1RwA//dLrWnP/o2kSknRi8lRT8bx8aexIU/HcVF/Wif+QuP3B/dr37PGK2L5njycvkqO+igMA9VAgAwlVF8ezxTtZddE3WzwPW3cf1qnTlRXxqdOe9JWJiOMkxX4VBwCqUSADwFlik17jGCsA7YQCGeddxHZcEXNC+2OTXuMYKwDthAIZ5926VYuaiuchYk6Sao4Eni2eh6w/GVL/KTG47Iqm4nkYGepXT3dXRSx1m76I4yTFHCsAyEKBjPPu3uEVumP14unZ2S4z3bF6cdI+yBFzkqQ9d99QUwwvv/IS7bn7hjQJSXpuyy01xbAV4yltv/P6miJvcNkV2n7n9YkyitmmL+I4STHHCgCymKfu3XQOBgYGfGxsLHUaqCPioRwAAADlzOxxdx+ojnOSHs67Ujun0o71UjsnSRTJAAAgPJZY4LyjnRMAAGhnFMg472jnBAAA2hkFMs472jkBAIB2xhpknHcjQ/0a+dK3Kk4Y655nyds5rbpvj1587Y3pj6+6dL4O3rMmYUYFEfNasmFXTez5xF0sJOndn/iaXj15ZvnOZRd26dufvClhRjFz2jQ6rh0Hj2rKXV1mWrdqUfKOLVLt0eqpO7ZIbCgGUB8zyGiNen3CEqouQiXpxdfe0Kr79iTKqCBiXvWK45nieakuRCXp1ZNTevcnvpYoo5g5bRod17YDRzRV7FA05a5tB45o0+h4spyk2uJYKhypvub+R9MkpDMbiidOTMp1ZkPx6KGJZDkBiIECGefd1t2HdWqqsn3gqSlPukmvugidLZ6XqHlFVF2IzhbPQ8Scdhw82lQ8L9XF8WzxPLChGEAWCmScd2zSA9KZyuhtnxXvZNyrAGShQMZ5xyY9IJ3SaZGNxjsZ9yoAWSiQcd6NDPWre17lL+PUm/SuunR+U/G8RM0rossu7GoqnoeIOa1btaipeF6qj1SfLZ6HkaF+9XRXPlY93V3JNxQDSI8CGa0RbJPewXvW1BSdEbpFRMwrq1tF6i4W3/7kTTWFZ+qOERFzund4he5YvXh6xrjLTHesXpy8i8Weu2+oKYZTd7EYXtmnzWtXqK+3Ryapr7dHm9euoIsFAJm38bq0gYEBHxsbS50Gqgxu2auJOmv4+np7tG/DjQkyAgAAqGVmj7v7QHWcPshzQLQ+nmx8aU60xy9qTgAA5IUCuc2V+niWWhWV+nhKSlbQLOztqTuDzMaXWhEfv4g5AQCQJ9Ygt7mIfTzZ+NK4iI9fxJwAAMgTM8htLuJyhtIsIy/Rzy7i4xcxJwAA8kSB3OaiLmcYXtlHQdyAiI9fxJwAAMgTBXITIm5cGhnqr1gvKsVYzhBxrG5/cL/2PXt8+uPBZVdo+53XJ8yo8Pjd/dATOl3WTGaeKenjF/WaimrT6Lh2HDyqKXd1mWndqkXJW6oBAM4Na5AbVNq4NHFiUq4zG5dGD00kzStiH8+IY1VdHEvSvmeP6/YH9yfKqGDse8crimNJOu2FeCoRr6moNo2Oa9uBI9PHOE+5a9uBI9o0Op44MwDAuaAPcoPo7du4iGO1ZMOuzM+lPABj2caHp4urcl1menbzzQkyii3aKxM8fgDQ3uiDfI7YuNQ4xqpx9YqrmeKdLGL7OR4/AJibWGLRoKwNSmxcqsVYNa50HHCj8U4Wsf0cjx8AzE0UyA2it2/jIo7V4LIrmornZd2qRU3FO1nEVyZ4/ABgbqJAbhAblxoXcazeP7BY86om9eZZIZ7SvcMrdMfqxdMzjl1mumP1Yrog1BHxlQkePwCYm9ikh44QceMgmlO9BlkqvDKR+o8vAED7YpMeOlrEl+fRHE5oBADkhQIZHYHT4eYGTmgEAOSBNcjoCBE3DgIAgJiYQUZH4OV5AADQqJYVyGb2OUm/JOkld//nZfHfkvSbkt6UtMvdf6cY3yjpQ5KmJH3E3Xe3Kre5JtrpYlLhCN4dB49qyl1dZlq3ahE7+zNUH4M9uOwKbb/z+oQZxcxJipkX1zoAzD2tXGLxZ5JuKg+Y2S9IulXSu939WkmfKsavkXSbpGuL3/MnZlb5ejjqKu3snzgxKdeZ08VGD00ky2nT6Li2HTgyfZrYlLu2HTiiTaPjyXKKOE5SbcEnSfuePa7bH9yfKKOYOUkx84p4rQMAzl3LCmR3/2+SjleFf0PSFnc/Wfyal4rxWyV9wd1Puvtzkp6R9N5W5TaXRDxdbMfBo03F8xBxnCTVFHyzxfMQMaeZ/v2UeUW81gEA5y7vTXrvlPRzZnbQzP7GzN5TjPdJKv+N8kIxVsPMPmxmY2Y29vLLL7c43fgiti+byuitnRXPQ8RxQvuLeK0DAM5d3gXyBZIul7Ra0oikh8zMJFmdr637G8bdH3D3AXcfWLBgQesybRMRTxcrnSrWaDwPEccJ7S/itQ4AOHd5F8gvSNrpBY9JOi3pbcX4orKvu1rSsZxza0sR25etW7WoqXgeIo6TVNhk1kw8DxFzmunfT5lXxGsdAHDu8i6QRyXdKElm9k5J8yX9o6SvSrrNzC40s6WSlkt6LOfc2tLwyj5tXrtCfb09MhWOTk599O69wyt0x+rF07NoXWa6Y/XipDv7I46TJG2/8/qaAi91Z4aIOUkx84p4rQMAzp15i9bKmdkOSTeoMEP8oqRPSPqPkj4n6TpJb0j6uLvvLX79PZJ+TYX2b+vd/ZHZ/o2BgQEfGxtrRfoAAACY48zscXcfqIm3qkDOAwUyAAAAzlZWgcxR0wAAAEAZjpoGgDkm4umaANBOKJABYA4pnRpZOhindGqkJIpkAGgQBfIcwGxRYzaNjmvHwaOacleXmdatWkS3Acw5M50ayX0BABpDgdzmmC1qzKbRcW07cGT64yn36Y8pkjGXcGokAJw7Num1uZlmi3DGjoNHm4oD7YpTIwHg3FEgtzlmixozldHOMCsOtKuop0YCQDuhQG5zzBY1pnTSWaNxoF1FPTUSANoJa5Db3MhQf8UaZInZonrWrVpUsQa5PA7MNcMr+yiIAeAcUCC3udIvQbpYzKy0EY8uFgAAYDYcNQ0AAICOxFHTAAAAQAMokAEAAIAyFMgAAABAGQpkAAAAoAwFMgAAAFCGNm9zwOihCdq8tbGIj9+m0XFa4gEAOhYFcpsbPTRRcVDIxIlJbdw5LknJiyzMLuLjt2l0vOJQlSn36Y8pkgEAnYAlFm1u6+7DFafoSdLkqSlt3X04UUZoRsTHb8fBo03FAQCYayiQ29yxE5NNxRFLxMdvKuPwoKw4AABzDQVym1vY29NUHLFEfPy6zJqKAwAw11Agt7mRoX71dHdVxHq6uzQy1J8oIzQj4uO3btWipuIAAMw1bNJrc6WNXNG6IKAxER+/0kY8ulgAADqVeRuvKxwYGPCxsbHUaQAAAKANmdnj7j5QHWeJBQAAAFCGAhkAAAAoQ4EMAAAAlKFABgAAAMpQIAMAAABlKJABAACAMhTIAAAAQBkKZAAAAKAMBTIAAABQhgIZAAAAKEOBDAAAAJShQAYAAADKmLunzuGsmdnLkr6XOo9A3ibpH1Mn0QYYp8YxVo1jrBrHWDWOsWoM49Q4xqrST7j7gupgWxfIqGRmY+4+kDqP6BinxjFWjWOsGsdYNY6xagzj1DjGqjEssQAAAADKUCADAAAAZSiQ55YHUifQJhinxjFWjWOsGsdYNY6xagzj1DjGqgGsQQYAAADKMIMMAAAAlKFABgAAAMpQILcpM+s1sy+b2XfN7Ckzu97MrjOzA2b2hJmNmdl7U+eZmpn1F8ej9Paqma03syvMbI+ZPV387+Wpc01thrHaWrzOvm1m/8nMelPnmlLWOJV9/uNm5mb2toRphjDTWJnZb5nZYTP7jpn9QeJUk5vh+cd9vQ4z+2jx2vk7M9thZhdxX68vY6y4r8+CNchtysw+L+m/u/ufmtl8SRdLekjSp939ETO7WdLvuPsNKfOMxMy6JE1IWiXpLknH3X2LmW2QdLm7/27SBAOpGqt+SXvd/U0z+3eSxFgVlI+Tu3/PzBZJ+lNJ75L0M+5OM/6iqmvqHZLukXSLu580syvd/aWkCQZSNVYPivt6BTPrk/QNSde4+6SZPSTpYUnXiPt6hRnG6pi4r8+IGeQ2ZGaXSfpfJH1Wktz9DXc/IcklXVb8sreq8ATAGb8o6Vl3/56kWyV9vhj/vKThVEkFNT1W7v5X7v5mMX5A0tUJ84qm/JqSpE9L+h0VnouoVD5WvyFpi7uflCSK4xrlY8V9vb4LJPWY2QUqTBAdE/f1LDVjxX19dhTI7ekdkl6W9B/M7JCZ/amZXSJpvaStZnZU0qckbUyYY0S3SdpRfP8qd/++JBX/e2WyrGIqH6tyvybpkZxziWx6nMzsfZIm3P1baVMKq/yaeqeknzOzg2b2N2b2noR5RVQ+VuvFfb2Cu0+oMBZHJH1f0ivu/lfivl5jhrEqx329Dgrk9nSBpJ+W9P+4+0pJr0vaoMKszEfdfZGkj6o4wwypuAzlfZK+lDqX6LLGyszukfSmpO0p8oqmfJzM7GIVlgz8ftqsYqpzTV0g6XJJqyWNSHrIzCxReqHUGSvu61WKa4tvlbRU0kJJl5jZHWmzimm2seK+no0CuT29IOkFdz9Y/PjLKhTMH5S0sxj7kiQ2c5zxryR9091fLH78opn9uCQV/8tLvGdUj5XM7IOSfknS7c7GhZLycVqmwi+gb5nZ8yq8XPlNM3t7wvwiqb6mXpC00wsek3RaUsdvaiyqHivu67X+paTn3P1ldz+lwvj8rLiv15M1VtzXZ0GB3Ibc/R8kHTWz/mLoFyU9qcIarJ8vxm6U9HSC9KJap8olA19V4RePiv/9y9wziqtirMzsJkm/K+l97v7DZFnFMz1O7j7u7le6+xJ3X6JCAfjTxecqap9/oyrco2Rm75Q0XxIbGguqx4r7eq0jklab2cXFVx5+UdJT4r5eT92x4r4+O7pYtCkzu06F3fLzJf29pH8j6VpJn1Hh5csfSfq37v54qhyjKL78fVTSO9z9lWLsx1To+rFYhRvI+939eLosY8gYq2ckXSjpn4pfdsDdfz1RiiHUG6eqzz8vaYAuFpnX1HxJn5N0naQ3JH3c3fcmSzKIjLH6F+K+XsPMPinpAyosDzgk6f+Q9BZxX6+RMVbfEff1GVEgAwAAAGVYYgEAAACUoUAGAAAAylAgAwAAAGUokAEAAIAyFMgAAABAGQpkAAjMzP5n1cf/2sz+aJbveZ+ZbZjla24ws/+S8bn1xZZjANCRKJABYI5x96+6+5Zz+BHrJVEgA+hYFMgA0KbMbIGZfcXM/rb4NliMT88ym9kyMztQ/Pz/WTUj/RYz+7KZfdfMtlvBRyQtlPR1M/t6gv8tAEjugtQJAABm1GNmT5R9fIUKR+pKhRPWPu3u3zCzxZJ2S/rJqu//jKTPuPsOM6s+KWulCidwHpO0T9Kgu/97M7tb0i9wEiCATkWBDACxTbr7daUPzOxfSxoofvgvJV1jZqVPX2Zml1Z9//WShovv/4WkT5V97jF3f6H4c5+QtETSN85b5gDQpiiQAaB9zZN0vbtPlgfLCubZnCx7f0r8TgAASaxBBoB29leSfrP0gZldV+drDkj634rv39bgz31NUvVMNAB0DApkAGhfH5E0YGbfNrMnJVWvMZYKHSnuNrPHJP24pFca+LkPSHqETXoAOpW5e+ocAAAtUuxnPOnubma3SVrn7remzgsAImO9GQDMbT8j6Y+ssDD5hKRfS5sOAMTHDDIAAABQhjXIAAAAQBkKZAAAAKAMBTIAAABQhgIZAAAAKEOBDAAAAJT5/wEF2g87zs/PPwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 720x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6))\n",
"plt.scatter(df['Height'],df['Weight'])\n",
"plt.xlabel('Height')\n",
"plt.ylabel('Weight')\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"In this notebook we have learnt how to perform basic operations on data to compute statistical functions. We now know how to use a sound apparatus of math and statistics in order to prove some hypotheses, and how to compute confidence intervals for arbitrary variables given a data sample. "
]
}
],
"metadata": {
"interpreter": {
"hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}