{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Probability and Statistics\n",
"|\n",
"In this notebook, we will play around with some of the concepts we have previously discussed. Many concepts from probability and statistics are well-represented in major libraries for data processing in Python, such as `numpy` and `pandas`."
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import random\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Random Variables and Distributions\n",
"\n",
"Let's start with drawing a sample of 30 variables from a uniform distribution from 0 to 9. We will also compute mean and variance."
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sample: [1, 1, 0, 5, 6, 3, 7, 5, 1, 6, 5, 6, 7, 0, 3, 6, 2, 4, 2, 8, 1, 5, 7, 10, 8, 5, 7, 10, 6, 8]\n",
"Mean = 4.833333333333333\n",
"Variance = 7.938888888888889\n"
]
}
],
"source": [
"sample = [ random.randint(0,10) for _ in range(30) ]\n",
"print(f\"Sample: {sample}\")\n",
"print(f\"Mean = {np.mean(sample)}\")\n",
"print(f\"Variance = {np.var(sample)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To visually estimate how many different values are there in the sample, we can plot the **histogram**:"
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAU30lEQVR4nO3df6yVBf3A8c8V84B27y0ohDsuikWhIGZghZpSKhsxpmv90NRY1h82NIhVgLYpLrlky9WisOuarRXBWqE0k0U/5OoaCXeSDJ0/JuktfzDL3YM0jxOe7x/Nu+4XUM/lc+7hHF+v7fzxPPc59/nsmfK895zn3KelKIoiAAASHFPvAQCA5iEsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0xw73Dg8cOBDPPPNMtLa2RktLy3DvHgAYgqIoYu/evdHR0RHHHHP46xLDHhbPPPNMdHZ2DvduAYAEfX19MWHChMP+fNjDorW1NSL+O1hbW9tw7x4AGIJyuRydnZ0D5/HDGfaweO3jj7a2NmEBAA3mjW5jcPMmAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaaoKixtvvDFaWloGvcaNG1er2QCABlP1s0KmTp0af/jDHwaWR4wYkToQANC4qg6LY4891lUKAOCQqr7H4vHHH4+Ojo6YNGlSXHrppfHkk0++7vaVSiXK5fKgFwDQnKq6YvHhD384fvazn8X73ve+eP755+Nb3/pWnH322bFr164YM2bMId/T1dUVK1asSBkWOPqdvOzueo/wlvD3VfPqPQIcUktRFMVQ37xv3754z3veE9/4xjdiyZIlh9ymUqlEpVIZWC6Xy9HZ2Rn9/f3R1tY21F0DRylhMTyEBcOtXC5He3v7G56/q77H4n+dcMIJcfrpp8fjjz9+2G1KpVKUSqUj2Q0A0CCO6O9YVCqVeOSRR2L8+PFZ8wAADayqsPja174WW7Zsid27d8df//rX+NSnPhXlcjkWLFhQq/kAgAZS1Uch//jHP+Kyyy6LF154Id797nfHRz7ykdi6dWucdNJJtZoPAGggVYXFunXrajUHANAEPCsEAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANEcUFl1dXdHS0hKLFy9OGgcAaGRDDott27ZFd3d3TJ8+PXMeAKCBDSksXnrppbj88svj9ttvj3e+853ZMwEADWpIYbFw4cKYN29eXHjhhW+4baVSiXK5POgFADSnY6t9w7p166K3tze2b9/+prbv6uqKFStWVD3YW8XJy+6u9whV+/uqefUeAd7y/NvB0aqqKxZ9fX2xaNGi+MUvfhEjR458U+9Zvnx59Pf3D7z6+vqGNCgAcPSr6opFb29v7NmzJ2bMmDGwbv/+/dHT0xOrV6+OSqUSI0aMGPSeUqkUpVIpZ1oA4KhWVVhccMEFsXPnzkHrvvCFL8SUKVNi6dKlB0UFAPDWUlVYtLa2xrRp0watO+GEE2LMmDEHrQcA3nr85U0AIE3V3wr5/+69996EMQCAZuCKBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQpqqwWLNmTUyfPj3a2tqira0tZs2aFffcc0+tZgMAGkxVYTFhwoRYtWpVbN++PbZv3x4f//jH4+KLL45du3bVaj4AoIEcW83G8+fPH7R88803x5o1a2Lr1q0xderU1MEAgMZTVVj8r/3798evfvWr2LdvX8yaNeuw21UqlahUKgPL5XJ5qLsEAI5yVYfFzp07Y9asWfHyyy/H29/+9tiwYUOcdtpph92+q6srVqxYcURDAtD4Tl52d71HqNrfV82r9wgNp+pvhbz//e+PHTt2xNatW+PLX/5yLFiwIB5++OHDbr98+fLo7+8fePX19R3RwADA0avqKxbHHXdcvPe9742IiJkzZ8a2bdvi+9//fvz4xz8+5PalUilKpdKRTQkANIQj/jsWRVEMuocCAHjrquqKxXXXXRdz586Nzs7O2Lt3b6xbty7uvffe2LRpU63mAwAaSFVh8fzzz8eVV14Zzz77bLS3t8f06dNj06ZNcdFFF9VqPgCggVQVFj/5yU9qNQcA0AQ8KwQASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASFNVWHR1dcVZZ50Vra2tMXbs2Ljkkkvi0UcfrdVsAECDqSostmzZEgsXLoytW7fG5s2b49VXX405c+bEvn37ajUfANBAjq1m402bNg1avuOOO2Ls2LHR29sb5513XupgAEDjqSos/r/+/v6IiBg9evRht6lUKlGpVAaWy+XykewSADiKtRRFUQzljUVRxMUXXxwvvvhi3HfffYfd7sYbb4wVK1YctL6/vz/a2tqGsuvDOnnZ3am/D+rt76vm1XuEqvn/EOqrVv9ulMvlaG9vf8Pz95C/FXLNNdfEQw89FL/85S9fd7vly5dHf3//wKuvr2+ouwQAjnJD+ijk2muvjY0bN0ZPT09MmDDhdbctlUpRKpWGNBwA0FiqCouiKOLaa6+NDRs2xL333huTJk2q1VwAQAOqKiwWLlwYa9eujbvuuitaW1vjueeei4iI9vb2GDVqVE0GBAAaR1X3WKxZsyb6+/tj9uzZMX78+IHX+vXrazUfANBAqv4oBADgcDwrBABIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDRVh0VPT0/Mnz8/Ojo6oqWlJe68884ajAUANKKqw2Lfvn1xxhlnxOrVq2sxDwDQwI6t9g1z586NuXPn1mIWAKDBVR0W1apUKlGpVAaWy+VyrXcJANRJzcOiq6srVqxYUevdQFM6ednd9R4BoCo1/1bI8uXLo7+/f+DV19dX610CAHVS8ysWpVIpSqVSrXcDABwF/B0LACBN1VcsXnrppXjiiScGlnfv3h07duyI0aNHx8SJE1OHAwAaS9VhsX379vjYxz42sLxkyZKIiFiwYEH89Kc/TRsMAGg8VYfF7NmzoyiKWswCADQ491gAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmGFBY/+tGPYtKkSTFy5MiYMWNG3HfffdlzAQANqOqwWL9+fSxevDiuv/76ePDBB+OjH/1ozJ07N55++ulazAcANJCqw+LWW2+NL37xi/GlL30pTj311Pje974XnZ2dsWbNmlrMBwA0kGOr2fiVV16J3t7eWLZs2aD1c+bMib/85S+HfE+lUolKpTKw3N/fHxER5XK52lnf0IHKf9J/JwA0klqcX//39xZF8brbVRUWL7zwQuzfvz9OPPHEQetPPPHEeO655w75nq6urlixYsVB6zs7O6vZNQDwJrR/r7a/f+/evdHe3n7Yn1cVFq9paWkZtFwUxUHrXrN8+fJYsmTJwPKBAwfi3//+d4wZM+aw7xmKcrkcnZ2d0dfXF21tbWm/l8Ec5+HjWA8Px3l4OM7Do5bHuSiK2Lt3b3R0dLzudlWFxbve9a4YMWLEQVcn9uzZc9BVjNeUSqUolUqD1r3jHe+oZrdVaWtr8x/tMHCch49jPTwc5+HhOA+PWh3n17tS8Zqqbt487rjjYsaMGbF58+ZB6zdv3hxnn312ddMBAE2n6o9ClixZEldeeWXMnDkzZs2aFd3d3fH000/H1VdfXYv5AIAGUnVYfPazn41//etfcdNNN8Wzzz4b06ZNi9/97ndx0kkn1WK+N61UKsUNN9xw0Mcu5HKch49jPTwc5+HhOA+Po+E4txRv9L0RAIA3ybNCAIA0wgIASCMsAIA0wgIASNM0YeFR7rXV1dUVZ511VrS2tsbYsWPjkksuiUcffbTeYzW9rq6uaGlpicWLF9d7lKbzz3/+M6644ooYM2ZMHH/88fGBD3wgent76z1WU3n11Vfjm9/8ZkyaNClGjRoVp5xyStx0001x4MCBeo/W8Hp6emL+/PnR0dERLS0tceeddw76eVEUceONN0ZHR0eMGjUqZs+eHbt27RqW2ZoiLDzKvfa2bNkSCxcujK1bt8bmzZvj1VdfjTlz5sS+ffvqPVrT2rZtW3R3d8f06dPrPUrTefHFF+Occ86Jt73tbXHPPffEww8/HN/97ndr+leB34q+/e1vx2233RarV6+ORx55JG655Zb4zne+Ez/4wQ/qPVrD27dvX5xxxhmxevXqQ/78lltuiVtvvTVWr14d27Zti3HjxsVFF10Ue/furf1wRRP40Ic+VFx99dWD1k2ZMqVYtmxZnSZqfnv27CkiotiyZUu9R2lKe/fuLSZPnlxs3ry5OP/884tFixbVe6SmsnTp0uLcc8+t9xhNb968ecVVV101aN0nP/nJ4oorrqjTRM0pIooNGzYMLB84cKAYN25csWrVqoF1L7/8ctHe3l7cdtttNZ+n4a9YvPYo9zlz5gxa/3qPcufI9ff3R0TE6NGj6zxJc1q4cGHMmzcvLrzwwnqP0pQ2btwYM2fOjE9/+tMxduzYOPPMM+P222+v91hN59xzz40//vGP8dhjj0VExN/+9re4//774xOf+ESdJ2tuu3fvjueee27QebFUKsX5558/LOfFIT3d9GgylEe5c2SKooglS5bEueeeG9OmTav3OE1n3bp10dvbG9u3b6/3KE3rySefjDVr1sSSJUviuuuuiwceeCC+8pWvRKlUis9//vP1Hq9pLF26NPr7+2PKlCkxYsSI2L9/f9x8881x2WWX1Xu0pvbaue9Q58Wnnnqq5vtv+LB4TTWPcufIXHPNNfHQQw/F/fffX+9Rmk5fX18sWrQofv/738fIkSPrPU7TOnDgQMycOTNWrlwZERFnnnlm7Nq1K9asWSMsEq1fvz5+/vOfx9q1a2Pq1KmxY8eOWLx4cXR0dMSCBQvqPV7Tq9d5seHDYiiPcmforr322ti4cWP09PTEhAkT6j1O0+nt7Y09e/bEjBkzBtbt378/enp6YvXq1VGpVGLEiBF1nLA5jB8/Pk477bRB60499dT49a9/XaeJmtPXv/71WLZsWVx66aUREXH66afHU089FV1dXcKihsaNGxcR/71yMX78+IH1w3VebPh7LDzKfXgURRHXXHNN/OY3v4k//elPMWnSpHqP1JQuuOCC2LlzZ+zYsWPgNXPmzLj88stjx44doiLJOeecc9DXpR977LG6P0yx2fznP/+JY44ZfJoZMWKEr5vW2KRJk2LcuHGDzouvvPJKbNmyZVjOiw1/xSLCo9yHw8KFC2Pt2rVx1113RWtr68AVovb29hg1alSdp2sera2tB923csIJJ8SYMWPcz5Loq1/9apx99tmxcuXK+MxnPhMPPPBAdHd3R3d3d71Hayrz58+Pm2++OSZOnBhTp06NBx98MG699da46qqr6j1aw3vppZfiiSeeGFjevXt37NixI0aPHh0TJ06MxYsXx8qVK2Py5MkxefLkWLlyZRx//PHxuc99rvbD1fx7J8Pkhz/8YXHSSScVxx13XPHBD37Q1yCTRcQhX3fccUe9R2t6vm5aG7/97W+LadOmFaVSqZgyZUrR3d1d75GaTrlcLhYtWlRMnDixGDlyZHHKKacU119/fVGpVOo9WsP785//fMh/kxcsWFAUxX+/cnrDDTcU48aNK0qlUnHeeecVO3fuHJbZPDYdAEjT8PdYAABHD2EBAKQRFgBAGmEBAKQRFgBAGmEBAKQRFgBAGmEBAKQRFgBAGmEBAKQRFgBAGmEBAKT5Pw1zelRjpGgFAAAAAElFTkSuQmCC",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.hist(sample)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyzing Real Data\n",
"\n",
"Mean and variance are very important when analyzing real-world data. Let's load the data about baseball players from [SOCR MLB Height/Weight Data](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights)"
]
},
{
"cell_type": "code",
"execution_count": 215,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Name
\n",
"
Team
\n",
"
Role
\n",
"
Height
\n",
"
Weight
\n",
"
Age
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Adam_Donachie
\n",
"
BAL
\n",
"
Catcher
\n",
"
74
\n",
"
180.0
\n",
"
22.99
\n",
"
\n",
"
\n",
"
1
\n",
"
Paul_Bako
\n",
"
BAL
\n",
"
Catcher
\n",
"
74
\n",
"
215.0
\n",
"
34.69
\n",
"
\n",
"
\n",
"
2
\n",
"
Ramon_Hernandez
\n",
"
BAL
\n",
"
Catcher
\n",
"
72
\n",
"
210.0
\n",
"
30.78
\n",
"
\n",
"
\n",
"
3
\n",
"
Kevin_Millar
\n",
"
BAL
\n",
"
First_Baseman
\n",
"
72
\n",
"
210.0
\n",
"
35.43
\n",
"
\n",
"
\n",
"
4
\n",
"
Chris_Gomez
\n",
"
BAL
\n",
"
First_Baseman
\n",
"
73
\n",
"
188.0
\n",
"
35.71
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
1029
\n",
"
Brad_Thompson
\n",
"
STL
\n",
"
Relief_Pitcher
\n",
"
73
\n",
"
190.0
\n",
"
25.08
\n",
"
\n",
"
\n",
"
1030
\n",
"
Tyler_Johnson
\n",
"
STL
\n",
"
Relief_Pitcher
\n",
"
74
\n",
"
180.0
\n",
"
25.73
\n",
"
\n",
"
\n",
"
1031
\n",
"
Chris_Narveson
\n",
"
STL
\n",
"
Relief_Pitcher
\n",
"
75
\n",
"
205.0
\n",
"
25.19
\n",
"
\n",
"
\n",
"
1032
\n",
"
Randy_Keisler
\n",
"
STL
\n",
"
Relief_Pitcher
\n",
"
75
\n",
"
190.0
\n",
"
31.01
\n",
"
\n",
"
\n",
"
1033
\n",
"
Josh_Kinney
\n",
"
STL
\n",
"
Relief_Pitcher
\n",
"
73
\n",
"
195.0
\n",
"
27.92
\n",
"
\n",
" \n",
"
\n",
"
1034 rows × 6 columns
\n",
"
"
],
"text/plain": [
" Name Team Role Height Weight Age\n",
"0 Adam_Donachie BAL Catcher 74 180.0 22.99\n",
"1 Paul_Bako BAL Catcher 74 215.0 34.69\n",
"2 Ramon_Hernandez BAL Catcher 72 210.0 30.78\n",
"3 Kevin_Millar BAL First_Baseman 72 210.0 35.43\n",
"4 Chris_Gomez BAL First_Baseman 73 188.0 35.71\n",
"... ... ... ... ... ... ...\n",
"1029 Brad_Thompson STL Relief_Pitcher 73 190.0 25.08\n",
"1030 Tyler_Johnson STL Relief_Pitcher 74 180.0 25.73\n",
"1031 Chris_Narveson STL Relief_Pitcher 75 205.0 25.19\n",
"1032 Randy_Keisler STL Relief_Pitcher 75 190.0 31.01\n",
"1033 Josh_Kinney STL Relief_Pitcher 73 195.0 27.92\n",
"\n",
"[1034 rows x 6 columns]"
]
},
"execution_count": 215,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_csv(\"../../data/SOCR_MLB.tsv\",sep='\\t',header=None,names=['Name','Team','Role','Height','Weight','Age'])\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> We are using a package called **Pandas** here for data analysis. We will talk more about Pandas and working with data in Python later in this course.\n",
"\n",
"Let's compute average values for age, height and weight:"
]
},
{
"cell_type": "code",
"execution_count": 216,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Age 28.736712\n",
"Height 73.697292\n",
"Weight 201.689255\n",
"dtype: float64"
]
},
"execution_count": 216,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Age','Height','Weight']].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's focus on height, and compute standard deviation and variance: "
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[180.0, 215.0, 210.0, 210.0, 188.0, 176.0, 209.0, 200.0, 231.0, 180.0, 188.0, 180.0, 185.0, 160.0, 180.0, 185.0, 197.0, 189.0, 185.0, 219.0]\n"
]
}
],
"source": [
"print(list(df['Height'])[:20])"
]
},
{
"cell_type": "code",
"execution_count": 218,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mean = 73.6972920696325\n",
"Variance = 5.316798081118081\n",
"Standard Deviation = 2.305818310517566\n"
]
}
],
"source": [
"mean = df['Height'].mean()\n",
"var = df['Height'].var()\n",
"std = df['Height'].std()\n",
"print(f\"Mean = {mean}\\nVariance = {var}\\nStandard Deviation = {std}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition to mean, it makes sense to look at median value and quartiles. They can be visualized using **box plot**:"
]
},
{
"cell_type": "code",
"execution_count": 217,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,2))\n",
"plt.boxplot(df['Height'],vert=False,showmeans=True)\n",
"plt.grid(color='gray',linestyle='dotted')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also make box plots of subsets of our dataset, for example, grouped by player role."
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df.boxplot(column='Height',by='Role')\n",
"plt.xticks(rotation='vertical')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Note**: This diagram suggests, that on average, height of first basemen is higher that height of second basemen. Later we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that. \n",
"\n",
"Age, height and weight are all continuous random variables. What do you think their distribution is? A good way to find out is to plot the histogram of values: "
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHgCAYAAABDx6wqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAABGB0lEQVR4nO3de3xU1b3///fkNhBIAiHmVkJAFFqBolzEA2gSIdxBRCtKVbBYORVoEdB6KRKsCsWKKBRse7gpRqj+AKl4wAAJF4EKQSpQi6gBFIKUiwkQHIZk/f7wmzkMuYckk6y8no/HPGDWXnvv9VnJbN7s2XvGYYwxAgAAsJSfrwcAAABQnQg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDuoNd599105HA4tX768yLKOHTvK4XBo3bp1RZa1bt1anTp1qtC+Ro0apZYtW1ZqnCkpKXI4HDp58mSZfV988UWtWrWqUvspdOjQITkcDi1evLjIGCoiLy9PKSkpysjIqNB6xe2rZcuWGjRoUIW2U5bU1FTNnj272GUOh0MpKSlVur+qtmHDBnXp0kWNGjWSw+Eo8ede+PMsraZf/OIXnj6XS0xMVPv27UsdR+HPq/Dh5+enmJgYDRgwQB999FG5amnZsqXXNho3bqxu3brpjTfeKDKexMTEcm0T8CXCDmqNxMREORwOpaene7WfPn1ae/fuVaNGjYos++abb/TVV18pKSmpQvuaMmWKVq5cedVjLktVhJ3iPPzww9q+fXuF1snLy9O0adMqHHYqs6/KKC3sbN++XQ8//HC1j6GyjDG65557FBgYqNWrV2v79u1KSEgodZ2QkBAtXrxYBQUFXu3nzp3TO++8o9DQ0Ksa09q1a7V9+3Zt3bpVr7zyio4fP67ExETt3r27XOv36NFD27dv1/bt27V48WI5HA6NHDlS8+fPv6pxAb4Q4OsBAIUiIiLUvn37Iv8Yb9q0SQEBARo9enSRsFP4vKJhp3Xr1lc1Vl9r3ry5mjdvXq37yMvLU3BwcI3sqyy33HKLT/dflmPHjun06dO688471atXr3KtM3z4cP3P//yPNmzYoOTkZE/78uXLlZ+fr6FDh2rp0qWVHlPnzp0VEREhSerevbtuvvlmtW7dWu+++265zoQ2adLEa9579+6t+Ph4zZo1S7/61a8qPa6aVPg7DHBmB7VKUlKSDhw4oOzsbE9bRkaGunbtqgEDBigzM1Nnz571Wubv769bb71V0g//w543b55uvPFGNWzYUE2bNtXdd9+tr776yms/xb2N9d1332n06NEKDw9X48aNNXDgQH311Vclvt3w7bff6r777lNYWJiioqL0i1/8Qjk5OZ7lDodD58+f15IlSzxvB5R1yv/YsWO65557FBISorCwMA0fPlzHjx8v0q+4t5Y2btyoxMRENWvWTA0bNlSLFi101113KS8vT4cOHdI111wjSZo2bZpnPKNGjfLa3u7du3X33XeradOmnkBY2ltmK1eu1E9/+lM1aNBA1157rV577TWv5YVnBA4dOuTVnpGRIYfD4Qm2iYmJWrNmjQ4fPuz19snlc3nlz2Dfvn2644471LRpUzVo0EA33nijlixZUux+3n77bT3zzDOKjY1VaGioevfurQMHDhRb05W2bt2qXr16KSQkRMHBwerevbvWrFnjWZ6SkuIJg7/97W/lcDjK9RZp27Zt1b17dy1cuNCrfeHChRo2bJjCwsLKNb7yKtxeYGBgpdZv0qSJ2rZtq8OHD5fab9q0aerWrZvCw8MVGhqqTp06acGCBbr8O6cLX2d5eXlF1r/99tvVrl07z/PyvqYL3+LbvHmzunfvruDgYP3iF7+QVPprA/UDYQe1SuEZmsvP7qSnpyshIUE9evSQw+HQli1bvJZ16tTJcyAfM2aMJkyYoN69e2vVqlWaN2+e9u/fr+7du+vbb78tcb8FBQUaPHiwUlNT9dvf/lYrV65Ut27d1K9fvxLXueuuu9SmTRv9f//f/6cnn3xSqampeuyxxzzLt2/froYNG2rAgAGetwPmzZtX4vYuXLig3r1768MPP9T06dP1zjvvKDo6WsOHDy9z3g4dOqSBAwcqKChICxcu1Nq1azVjxgw1atRIFy9eVExMjNauXSvph39oCsczZcoUr+0MGzZM1113nd555x29/vrrpe5zz549mjBhgh577DGtXLlS3bt3129+8xv98Y9/LHO8V5o3b5569Oih6Ohoz9hKe+vswIED6t69u/bv36/XXntNK1as0A033KBRo0Zp5syZRfo//fTTOnz4sP7nf/5Hf/nLX3Tw4EENHjxY+fn5pY5r06ZNuv3225WTk6MFCxbo7bffVkhIiAYPHuy5tuzhhx/WihUrJEnjx4/X9u3by/0W6ejRo7Vq1SqdOXPGU9e2bds0evTocq1fmvz8fF26dEkXL17UF198obFjx8rpdOruu++u1PbcbrcOHz7sCc0lOXTokMaMGaO//e1vWrFihYYNG6bx48fr97//vafPb37zG505c0apqale6/7rX/9Senq6xo4d62mryGs6Oztb999/v0aMGKEPPvhAjz76aJmvDdQTBqhFTp8+bfz8/MwjjzxijDHm5MmTxuFwmLVr1xpjjLn55pvN5MmTjTHGHDlyxEgyTzzxhDHGmO3btxtJ5uWXX/ba5tdff20aNmzo6WeMMSNHjjTx8fGe52vWrDGSzPz5873WnT59upFkpk6d6mmbOnWqkWRmzpzp1ffRRx81DRo0MAUFBZ62Ro0amZEjR5ar9vnz5xtJ5r333vNq/+Uvf2kkmUWLFhUZQ6F3333XSDJ79uwpcfv/+c9/itRy5faeffbZEpddLj4+3jgcjiL7S05ONqGhoeb8+fPGGGMWLVpkJJmsrCyvfunp6UaSSU9P97QNHDjQ62dyuSvHfe+99xqn02mOHDni1a9///4mODjYfPfdd177GTBggFe/v/3tb0aS2b59e7H7K3TLLbeYyMhIc/bsWU/bpUuXTPv27U3z5s09P+usrCwjybz00kulbu/KvmfPnjWNGzc2c+fONcYY8/jjj5tWrVqZgoICM3bs2CLznpCQYNq1a1fq9gt/Xlc+QkNDzYoVK8ocnzE//HwHDBhg3G63cbvdJisry4wcOdJIMo8//rjXeBISEkrcTn5+vnG73ea5554zzZo183ptJCQkmBtvvNGr/69+9SsTGhrqme+KvKYTEhKMJLNhwwavvuV5bcB+nNlBrdK0aVN17NjRc2Zn06ZN8vf3V48ePSRJCQkJnut0rrxe5/3335fD4dD999+vS5cueR7R0dFe2yzOpk2bJEn33HOPV/t9991X4jpDhgzxev7Tn/5U33//vU6cOFH+gi+Tnp6ukJCQItsdMWJEmeveeOONCgoK0iOPPKIlS5YUOcVfXnfddVe5+7Zr104dO3b0ahsxYoRyc3PLfRFsZW3cuFG9evVSXFycV/uoUaOUl5dX5KxQcT8rSaW+JXP+/Hn94x//0N13363GjRt72v39/fXAAw/om2++KfdbYSVp3Lixfvazn2nhwoW6dOmS3njjDT300EMVvtOuOOvXr9fOnTv18ccf6/3331fv3r117733lvus0wcffKDAwEAFBgaqVatW+tvf/qbx48fr+eefL3W9jRs3qnfv3goLC5O/v78CAwP17LPP6tSpU16vjd/85jfas2eP5w6x3Nxcvfnmmxo5cqRnviv6mm7atKluv/12r7aqem2gbiPsoNZJSkrS559/rmPHjik9PV2dO3f2HPwSEhL0ySefKCcnR+np6QoICFDPnj0l/XANjTFGUVFRnoN04WPHjh2l3ip+6tQpBQQEKDw83Ks9KiqqxHWaNWvm9dzpdEr64e2oyjh16lSx+4uOji5z3datW2v9+vWKjIzU2LFj1bp1a7Vu3VqvvvpqhcYQExNT7r7Fjauw7dSpUxXab0WdOnWq2LHGxsYWu//K/KzOnDkjY0yF9lMZo0eP1u7du/XCCy/oP//5j+c6qqvVsWNHdenSRV27dtXAgQP1zjvv6LrrrvN6i6g0PXv21M6dO7Vr1y7961//0nfffafXXntNQUFBJa7z8ccfq0+fPpKkv/71r/roo4+0c+dOPfPMM5K85/uOO+5Qy5Yt9ac//UnSD9d3nT9/3mt8FX1NF/ezqqrXBuo27sZCrZOUlKRZs2YpIyNDGRkZGjBggGdZYbDZvHmz58LlwiAUERHhuaan8B+zyxXXVqhZs2a6dOmSTp8+7RV4irs4uLo0a9ZMH3/8cZH28o7h1ltv1a233qr8/Hzt2rVLc+bM0YQJExQVFaV77723XNuoyBmF4sZV2FYYLho0aCBJcrlcXv3K8xlFpWnWrJnXReyFjh07Jkmeu5CuRtOmTeXn51ft++nRo4fatm2r5557TsnJyUXOVlUVPz8/tWvXTu+8845OnDihyMjIUvuHhYWpS5cuFdrHsmXLFBgYqPfff9/zs5dU7Mcv+Pn5aezYsXr66af18ssva968eerVq5fatm3r6VPR13RJv79V8dpA3caZHdQ6t912m/z9/fXuu+9q//79XncwhYWFee66OXTokNct54MGDZIxRkePHlWXLl2KPDp06FDiPgs/E+XKDzRctmzZVdXidDrLfaYnKSlJZ8+e1erVq73ar7yIsyz+/v7q1q2b53/MhW8pXe2Zpyvt379f//znP73aUlNTFRIS4rm1ufCupE8//dSr35U1Fo6vvGPr1auXNm7c6Akdhd544w0FBwdXya3qjRo1Urdu3bRixQqvcRUUFGjp0qVq3ry52rRpc9X7kaTf/e53Gjx4sCZNmlQl2ytOfn6+9u7dK6fTedWf4VMSh8OhgIAA+fv7e9ouXLigN998s9j+Dz/8sIKCgvTzn/9cBw4c0Lhx47yWX81rujglvTZgP87soNYpvF111apV8vPz81yvUyghIcHz4XOXh50ePXrokUce0UMPPaRdu3bptttuU6NGjZSdna2tW7eqQ4cOJX4+SL9+/dSjRw9NmjRJubm56ty5s7Zv3+75xFg/v8r9v6BDhw7KyMjQ3//+d8XExCgkJMTrf66Xe/DBB/XKK6/owQcf1AsvvKDrr79eH3zwQbGfGn2l119/XRs3btTAgQPVokULff/9955bmnv37i3phw+xi4+P13vvvadevXopPDxcERERlf4k6djYWA0ZMkQpKSmKiYnR0qVLlZaWpj/84Q+ezzbp2rWr2rZtq8mTJ+vSpUtq2rSpVq5cqa1btxY7VytWrND8+fPVuXNn+fn5lXhmYerUqXr//feVlJSkZ599VuHh4Xrrrbe0Zs0azZw5s8pu254+fbqSk5OVlJSkyZMnKygoSPPmzdO+ffv09ttvV8m1NZJ0//336/777y9X39zcXL377rtF2q+55hqvDzLMzMz0zMO3336rhQsX6t///rcee+wxr7MuVWngwIGaNWuWRowYoUceeUSnTp3SH//4xxLPqjZp0kQPPvig5s+fr/j4eA0ePNhr+dW8pguV57WBesCnl0cDJXjiiSeMJNOlS5ciy1atWmUkmaCgIM9dP5dbuHCh6datm2nUqJFp2LChad26tXnwwQfNrl27PH2uvBvLmB/uBHvooYdMkyZNTHBwsElOTjY7duwwksyrr77q6Vd4t8t//vMfr/WLu/Noz549pkePHiY4ONhIKvXOFWOM+eabb8xdd91lGjdubEJCQsxdd91ltm3bVubdWNu3bzd33nmniY+PN06n0zRr1swkJCSY1atXe21//fr15qabbjJOp9NI8twpVlJNxe3LmB/u1hk4cKB59913Tbt27UxQUJBp2bKlmTVrVpH1P//8c9OnTx8TGhpqrrnmGjN+/HjP3W+X3411+vRpc/fdd5smTZoYh8PhtU8VcxfZ3r17zeDBg01YWJgJCgoyHTt29JojY/7vbqx33nnHq73wjqgr+xdny5Yt5vbbb/f8Pt1yyy3m73//e7Hbq+jdWKUp6W4sFXOn1eW/W8XdjRUeHm66detmFi5caPLz88scY+HPtyzF3Y21cOFC07ZtW+N0Os21115rpk+fbhYsWFDsXXnGGJORkWEkmRkzZpS4n/K8pku6U628rw3YzWHMZZ/0BMBLamqqfv7zn+ujjz5S9+7dfT0cwDqTJk3S/Pnz9fXXXxe5kByoKryNBfw/b7/9to4ePaoOHTrIz89PO3bs0EsvvaTbbruNoANUsR07dujzzz/XvHnzNGbMGIIOqhVndoD/5/3331dKSoq++OILnT9/XjExMRo6dKief/75arugE6ivHA6HgoODNWDAAC1atMjrs4yAqkbYAQAAVuPWcwAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqwX4egC1QUFBgY4dO6aQkBA5HA5fDwcAAJSDMUZnz55VbGys/PxKPn9D2JF07NgxxcXF+XoYAACgEr7++ms1b968xOWEHUkhISGSfpis0NBQH4+marndbn344Yfq06ePAgMDfT2cGkf99bt+iTmo7/VLzIHN9efm5iouLs7z73hJCDuS562r0NBQK8NOcHCwQkNDrfslLw/qr9/1S8xBfa9fYg7qQ/1lXYLCBcoAAMBqhB0AAGA1n4ad6dOnq2vXrgoJCVFkZKSGDh2qAwcOePUxxiglJUWxsbFq2LChEhMTtX//fq8+LpdL48ePV0REhBo1aqQhQ4bom2++qclSAABALeXTsLNp0yaNHTtWO3bsUFpami5duqQ+ffro/Pnznj4zZ87UrFmzNHfuXO3cuVPR0dFKTk7W2bNnPX0mTJiglStXatmyZdq6davOnTunQYMGKT8/3xdlAQCAWsSnFyivXbvW6/miRYsUGRmpzMxM3XbbbTLGaPbs2XrmmWc0bNgwSdKSJUsUFRWl1NRUjRkzRjk5OVqwYIHefPNN9e7dW5K0dOlSxcXFaf369erbt2+N1wUAAGqPWnU3Vk5OjiQpPDxckpSVlaXjx4+rT58+nj5Op1MJCQnatm2bxowZo8zMTLndbq8+sbGxat++vbZt21Zs2HG5XHK5XJ7nubm5kn64Yt3tdldLbb5SWI9tdZUX9dfv+iXmoL7XLzEHNtdf3ppqTdgxxmjixInq2bOn2rdvL0k6fvy4JCkqKsqrb1RUlA4fPuzpExQUpKZNmxbpU7j+laZPn65p06YVaf/www8VHBx81bXURmlpab4egk9Rf/2uX2IO6nv9EnNgY/15eXnl6ldrws64ceP06aefauvWrUWWXXn/vDGmzHvqS+vz1FNPaeLEiZ7nhR9K1KdPHys/ZyctLU3JycnWfr5Caai/ftcvMQf1vX6JObC5/sJ3ZspSK8LO+PHjtXr1am3evNnr456jo6Ml/XD2JiYmxtN+4sQJz9me6OhoXbx4UWfOnPE6u3PixAl179692P05nU45nc4i7YGBgdb9IhSyubbyoP76Xb/EHNT3+iXmwMb6y1uPT+/GMsZo3LhxWrFihTZu3KhWrVp5LW/VqpWio6O9Tr1dvHhRmzZt8gSZzp07KzAw0KtPdna29u3bV2LYAQAA9YdPz+yMHTtWqampeu+99xQSEuK5xiYsLEwNGzaUw+HQhAkT9OKLL+r666/X9ddfrxdffFHBwcEaMWKEp+/o0aM1adIkNWvWTOHh4Zo8ebI6dOjguTsLAADUXz4NO/Pnz5ckJSYmerUvWrRIo0aNkiQ98cQTunDhgh599FGdOXNG3bp104cffuj1pV+vvPKKAgICdM899+jChQvq1auXFi9eLH9//5oqBQAA1FI+DTvGmDL7OBwOpaSkKCUlpcQ+DRo00Jw5czRnzpwqHB0AALAB340FAACsRtgBAABWqxW3ngOouJZPrimzj9PfaObNUvuUdXLll/7ZVOV1aMbAKtkOANQUzuwAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKwW4Mudb968WS+99JIyMzOVnZ2tlStXaujQoZ7lDoej2PVmzpypxx9/XJKUmJioTZs2eS0fPny4li1bVm3jBi7X8sk1vh4CAKAUPj2zc/78eXXs2FFz584tdnl2drbXY+HChXI4HLrrrru8+v3yl7/06vfnP/+5JoYPAADqAJ+e2enfv7/69+9f4vLo6Giv5++9956SkpJ07bXXerUHBwcX6QsAACD5OOxUxLfffqs1a9ZoyZIlRZa99dZbWrp0qaKiotS/f39NnTpVISEhJW7L5XLJ5XJ5nufm5kqS3G633G531Q/ehwrrsa2u8qqJ+p3+ptq2fbWcfsbrz6pQ136XeA3U7/ol5sDm+stbk8MYUyuO1A6Ho8g1O5ebOXOmZsyYoWPHjqlBgwae9r/+9a9q1aqVoqOjtW/fPj311FO67rrrlJaWVuK+UlJSNG3atCLtqampCg4OvupaAABA9cvLy9OIESOUk5Oj0NDQEvvVmbDz4x//WMnJyZozZ06p28nMzFSXLl2UmZmpTp06FdunuDM7cXFxOnnyZKmTVRe53W6lpaUpOTlZgYGBvh5OjauJ+tunrKuW7VYFp5/R77sUaMouP7kKir/gv6L2pfStku3UFF4D9bt+iTmwuf7c3FxFRESUGXbqxNtYW7Zs0YEDB7R8+fIy+3bq1EmBgYE6ePBgiWHH6XTK6XQWaQ8MDLTuF6GQzbWVR3XW78qvmhBRnVwFjiobZ139PeI1UL/rl5gDG+svbz114nN2FixYoM6dO6tjx45l9t2/f7/cbrdiYmJqYGQAAKC28+mZnXPnzumLL77wPM/KytKePXsUHh6uFi1aSPrhFNU777yjl19+ucj6X375pd566y0NGDBAERER+te//qVJkybppptuUo8ePWqsDgAAUHv5NOzs2rVLSUlJnucTJ06UJI0cOVKLFy+WJC1btkzGGN13331F1g8KCtKGDRv06quv6ty5c4qLi9PAgQM1depU+fv710gNAACgdvNp2ElMTFRZ10c/8sgjeuSRR4pdFhcXV+TTkwEAAC5XJ67ZAQAAqCzCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwWoAvd75582a99NJLyszMVHZ2tlauXKmhQ4d6lo8aNUpLlizxWqdbt27asWOH57nL5dLkyZP19ttv68KFC+rVq5fmzZun5s2b11QZQL3S8sk1PtnvoRkDfbJfAHWfT8/snD9/Xh07dtTcuXNL7NOvXz9lZ2d7Hh988IHX8gkTJmjlypVatmyZtm7dqnPnzmnQoEHKz8+v7uEDAIA6wKdndvr376/+/fuX2sfpdCo6OrrYZTk5OVqwYIHefPNN9e7dW5K0dOlSxcXFaf369erbt2+VjxkAANQtPg075ZGRkaHIyEg1adJECQkJeuGFFxQZGSlJyszMlNvtVp8+fTz9Y2Nj1b59e23btq3EsONyueRyuTzPc3NzJUlut1tut7saq6l5hfXYVld51UT9Tn9Tbdu+Wk4/4/VnXVbZnyGvgfpdv8Qc2Fx/eWtyGGNqxVHQ4XAUuWZn+fLlaty4seLj45WVlaUpU6bo0qVLyszMlNPpVGpqqh566CGv4CJJffr0UatWrfTnP/+52H2lpKRo2rRpRdpTU1MVHBxcpXUBAIDqkZeXpxEjRignJ0ehoaEl9qvVZ3aGDx/u+Xv79u3VpUsXxcfHa82aNRo2bFiJ6xlj5HA4Slz+1FNPaeLEiZ7nubm5iouLU58+fUqdrLrI7XYrLS1NycnJCgwM9PVwalxN1N8+ZV21bLcqOP2Mft+lQFN2+clVUPJroi7Yl1K5t6V5DdTv+iXmwOb6C9+ZKUutDjtXiomJUXx8vA4ePChJio6O1sWLF3XmzBk1bdrU0+/EiRPq3r17idtxOp1yOp1F2gMDA637RShkc22FirtLyOlvNPNm6aYXNsqVX13/2Nf+EOEqcFRj/TXjan9/68NroDT1vX6JObCx/vLWU6c+Z+fUqVP6+uuvFRMTI0nq3LmzAgMDlZaW5umTnZ2tffv2lRp2AABA/eHTMzvnzp3TF1984XmelZWlPXv2KDw8XOHh4UpJSdFdd92lmJgYHTp0SE8//bQiIiJ05513SpLCwsI0evRoTZo0Sc2aNVN4eLgmT56sDh06eO7OAgAA9ZtPw86uXbuUlJTkeV54Hc3IkSM1f/587d27V2+88Ya+++47xcTEKCkpScuXL1dISIhnnVdeeUUBAQG65557PB8quHjxYvn7+9d4PQAAoPbxadhJTExUaTeDrVtX9oWfDRo00Jw5czRnzpyqHBoAALBEnbpmBwAAoKIIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaj4NO5s3b9bgwYMVGxsrh8OhVatWeZa53W799re/VYcOHdSoUSPFxsbqwQcf1LFjx7y2kZiYKIfD4fW49957a7gSAABQW/k07Jw/f14dO3bU3LlziyzLy8vT7t27NWXKFO3evVsrVqzQ559/riFDhhTp+8tf/lLZ2dmex5///OeaGD4AAKgDAny58/79+6t///7FLgsLC1NaWppX25w5c3TzzTfryJEjatGihac9ODhY0dHR1TpWAABQN/k07FRUTk6OHA6HmjRp4tX+1ltvaenSpYqKilL//v01depUhYSElLgdl8sll8vleZ6bmyvph7fO3G53tYzdVwrrsa2u4jj9TdE2P+P1Z31jU/2V/R2uT6+B4tT3+iXmwOb6y1uTwxhTK46CDodDK1eu1NChQ4td/v3336tnz5768Y9/rKVLl3ra//rXv6pVq1aKjo7Wvn379NRTT+m6664rclbocikpKZo2bVqR9tTUVAUHB191LQAAoPrl5eVpxIgRysnJUWhoaIn96kTYcbvd+tnPfqYjR44oIyOj1IIyMzPVpUsXZWZmqlOnTsX2Ke7MTlxcnE6ePFnqtusit9uttLQ0JScnKzAw0NfDqVbtU9YVaXP6Gf2+S4Gm7PKTq8Dhg1H5lk3170vpW6n16tNroDj1vX6JObC5/tzcXEVERJQZdmr921hut1v33HOPsrKytHHjxjLDSKdOnRQYGKiDBw+WGHacTqecTmeR9sDAQOt+EQrZXFshV37J/5i7ChylLredDfVf7e9vfXgNlKa+1y8xBzbWX956anXYKQw6Bw8eVHp6upo1a1bmOvv375fb7VZMTEwNjBAAANR2Pg07586d0xdffOF5npWVpT179ig8PFyxsbG6++67tXv3br3//vvKz8/X8ePHJUnh4eEKCgrSl19+qbfeeksDBgxQRESE/vWvf2nSpEm66aab1KNHD1+VBQAAahGfhp1du3YpKSnJ83zixImSpJEjRyolJUWrV6+WJN14441e66WnpysxMVFBQUHasGGDXn31VZ07d05xcXEaOHCgpk6dKn9//xqrAwAA1F4+DTuJiYkq7frosq6djouL06ZNm6p6WAAAwCJ8NxYAALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgtUqFnWuvvVanTp0q0v7dd9/p2muvvepBAQAAVJVKhZ1Dhw4pPz+/SLvL5dLRo0evelAAAABVJaAinVevXu35+7p16xQWFuZ5np+frw0bNqhly5ZVNjgAAICrVaGwM3ToUEmSw+HQyJEjvZYFBgaqZcuWevnll6tscAAAAFerQmGnoKBAktSqVSvt3LlTERER1TIoAACAqlKhsFMoKyurqscBAABQLSoVdiRpw4YN2rBhg06cOOE541No4cKFVz0wAACAqlCpsDNt2jQ999xz6tKli2JiYuRwOKp6XAAAAFWiUmHn9ddf1+LFi/XAAw9U9XgAAACqVKU+Z+fixYvq3r17VY8FAACgylUq7Dz88MNKTU2t6rEAAABUuUq9jfX999/rL3/5i9avX6+f/vSnCgwM9Fo+a9asKhkcAADA1apU2Pn000914403SpL27dvntYyLlQEAQG1SqbCTnp5e1eMAAACoFpW6ZgcAAKCuqNSZnaSkpFLfrtq4cWOlBwQAAFCVKhV2Cq/XKeR2u7Vnzx7t27evyBeEAgAA+FKlws4rr7xSbHtKSorOnTt3VQMCAACoSlV6zc7999/P92IBAIBapUrDzvbt29WgQYNy99+8ebMGDx6s2NhYORwOrVq1ymu5MUYpKSmKjY1Vw4YNlZiYqP3793v1cblcGj9+vCIiItSoUSMNGTJE33zzTVWUAwAALFCpt7GGDRvm9dwYo+zsbO3atUtTpkwp93bOnz+vjh076qGHHtJdd91VZPnMmTM1a9YsLV68WG3atNHzzz+v5ORkHThwQCEhIZKkCRMm6O9//7uWLVumZs2aadKkSRo0aJAyMzPl7+9fmfIA1EItn1xTqfWc/kYzb5bap6yTK7/inwN2aMbASu0XQO1RqbATFhbm9dzPz09t27bVc889pz59+pR7O/3791f//v2LXWaM0ezZs/XMM894wtWSJUsUFRWl1NRUjRkzRjk5OVqwYIHefPNN9e7dW5K0dOlSxcXFaf369erbt2+x23a5XHK5XJ7nubm5kn640Nrtdpd7/HVBYT221VUcp78p2uZnvP6sb+p7/dLVz0Fdf+3Up2NASer7HNhcf3lrchhjasVR0OFwaOXKlRo6dKgk6auvvlLr1q21e/du3XTTTZ5+d9xxh5o0aaIlS5Zo48aN6tWrl06fPq2mTZt6+nTs2FFDhw7VtGnTit1XSkpKsctSU1MVHBxctYUBAIBqkZeXpxEjRignJ0ehoaEl9qvUmZ1CmZmZ+uyzz+RwOHTDDTd4hZKrdfz4cUlSVFSUV3tUVJQOHz7s6RMUFOQVdAr7FK5fnKeeekoTJ070PM/NzVVcXJz69OlT6mTVRW63W2lpaUpOTi7yHWa2aZ+yrkib08/o910KNGWXn1wF9e+rTOp7/dLVz8G+lOLPENcV9ekYUJL6Pgc211/4zkxZKhV2Tpw4oXvvvVcZGRlq0qSJjDHKyclRUlKSli1bpmuuuaYymy3WlR9eaIwp8/u3yurjdDrldDqLtAcGBlr3i1DI5toKlXY9hqvAUanrNWxR3+uXKj8Htrxu6sMxoCz1fQ5srL+89VTqbqzx48crNzdX+/fv1+nTp3XmzBnt27dPubm5+vWvf12ZTRYRHR0tSUXO0Jw4ccJztic6OloXL17UmTNnSuwDAADqt0qFnbVr12r+/Pn6yU9+4mm74YYb9Kc//Un/+7//WyUDa9WqlaKjo5WWluZpu3jxojZt2qTu3btLkjp37qzAwECvPtnZ2dq3b5+nDwAAqN8q9TZWQUFBsaeOAgMDVVBQUO7tnDt3Tl988YXneVZWlvbs2aPw8HC1aNFCEyZM0Isvvqjrr79e119/vV588UUFBwdrxIgRkn64K2z06NGaNGmSmjVrpvDwcE2ePFkdOnTw3J0FAADqt0qFndtvv12/+c1v9Pbbbys2NlaSdPToUT322GPq1atXubeza9cuJSUleZ4XXjQ8cuRILV68WE888YQuXLigRx99VGfOnFG3bt304Ycfej5jR/rhqysCAgJ0zz336MKFC+rVq5cWL17MZ+wAAABJlQw7c+fO1R133KGWLVsqLi5ODodDR44cUYcOHbR06dJybycxMVGl3fnucDiUkpKilJSUEvs0aNBAc+bM0Zw5cypSAgAAqCcqFXbi4uK0e/dupaWl6d///reMMbrhhht46wgAANQ6FbpAeePGjbrhhhs897UnJydr/Pjx+vWvf62uXbuqXbt22rJlS7UMFAAAoDIqFHZmz56tX/7yl8V+8F5YWJjGjBmjWbNmVdngAAAArlaFws4///lP9evXr8Tlffr0UWZm5lUPCgAAoKpUKOx8++23pX5aYUBAgP7zn/9c9aAAAACqSoXCzo9+9CPt3bu3xOWffvqpYmJirnpQAAAAVaVCYWfAgAF69tln9f333xdZduHCBU2dOlWDBg2qssEBAABcrQrdev673/1OK1asUJs2bTRu3Di1bdtWDodDn332mf70pz8pPz9fzzzzTHWNFQAAoMIqFHaioqK0bds2/epXv9JTTz3l+UBAh8Ohvn37at68eXwBJwAAqFUq/KGC8fHx+uCDD3TmzBl98cUXMsbo+uuvV9OmTatjfAAAAFelUp+gLElNmzZV165dq3IsAAAAVa5CFygDAADUNYQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAq9X6sNOyZUs5HI4ij7Fjx0qSRo0aVWTZLbfc4uNRAwCA2iLA1wMoy86dO5Wfn+95vm/fPiUnJ+tnP/uZp61fv35atGiR53lQUFCNjhEAANRetT7sXHPNNV7PZ8yYodatWyshIcHT5nQ6FR0dXe5tulwuuVwuz/Pc3FxJktvtltvtvsoR1y6F9dhWV3Gc/qZom5/x+rO+qe/1S1c/B3X9tVOfjgElqe9zYHP95a3JYYypM0fBixcvKjY2VhMnTtTTTz8t6Ye3sVatWqWgoCA1adJECQkJeuGFFxQZGVnidlJSUjRt2rQi7ampqQoODq628QMAgKqTl5enESNGKCcnR6GhoSX2q1Nh529/+5tGjBihI0eOKDY2VpK0fPlyNW7cWPHx8crKytKUKVN06dIlZWZmyul0Frud4s7sxMXF6eTJk6VOVl3kdruVlpam5ORkBQYG+no41ap9yroibU4/o993KdCUXX5yFTh8MCrfqu/1S1c/B/tS+lbDqGpOfToGlKS+z4HN9efm5ioiIqLMsFPr38a63IIFC9S/f39P0JGk4cOHe/7evn17denSRfHx8VqzZo2GDRtW7HacTmexQSgwMNC6X4RCNtdWyJVf8j9krgJHqcttV9/rlyo/B9dP+bAaRlO2QzMGVun26sMxoCz1fQ5srL+89dSZsHP48GGtX79eK1asKLVfTEyM4uPjdfDgwRoaGQAAqM3qTNhZtGiRIiMjNXBg6f/bOXXqlL7++mvFxMTU0MhwpZZPrvH1EAAA8Kj1n7MjSQUFBVq0aJFGjhypgID/y2fnzp3T5MmTtX37dh06dEgZGRkaPHiwIiIidOedd/pwxAAAoLaoE2d21q9fryNHjugXv/iFV7u/v7/27t2rN954Q999951iYmKUlJSk5cuXKyQkxEejBQAAtUmdCDt9+vRRcTeNNWzYUOvWFb0DBwAAoFCdeBsLAACgsgg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqtTrspKSkyOFweD2io6M9y40xSklJUWxsrBo2bKjExETt37/fhyMGAAC1Ta0OO5LUrl07ZWdnex579+71LJs5c6ZmzZqluXPnaufOnYqOjlZycrLOnj3rwxEDAIDaJMDXAyhLQECA19mcQsYYzZ49W88884yGDRsmSVqyZImioqKUmpqqMWPGlLhNl8sll8vleZ6bmytJcrvdcrvdVVyBbxXWU5N1Of1Nje2rLE4/4/VnfVPf65fq7hxU1WvWF8eA2qa+z4HN9Ze3JocxptYeAVJSUvTSSy8pLCxMTqdT3bp104svvqhrr71WX331lVq3bq3du3frpptu8qxzxx13qEmTJlqyZEmp2502bVqR9tTUVAUHB1dLLQAAoGrl5eVpxIgRysnJUWhoaIn9anXY+d///V/l5eWpTZs2+vbbb/X888/r3//+t/bv368DBw6oR48eOnr0qGJjYz3rPPLIIzp8+LDWrVtX4naLO7MTFxenkydPljpZdZHb7VZaWpqSk5MVGBhYI/tsn1Ly3Nc0p5/R77sUaMouP7kKHL4eTo2r7/VLdXcO9qX0rZLt+OIYUNvU9zmwuf7c3FxFRESUGXZq9dtY/fv39/y9Q4cO+q//+i+1bt1aS5Ys0S233CJJcji8D17GmCJtV3I6nXI6nUXaAwMDrftFKFSTtbnya98/KK4CR60cV02p7/VLdW8Oqvr1avPxrbzq+xzYWH9566n1FyhfrlGjRurQoYMOHjzouY7n+PHjXn1OnDihqKgoXwwPAADUQnUq7LhcLn322WeKiYlRq1atFB0drbS0NM/yixcvatOmTerevbsPRwkAAGqTWv021uTJkzV48GC1aNFCJ06c0PPPP6/c3FyNHDlSDodDEyZM0Isvvqjrr79e119/vV588UUFBwdrxIgRvh46AACoJWp12Pnmm29033336eTJk7rmmmt0yy23aMeOHYqPj5ckPfHEE7pw4YIeffRRnTlzRt26ddOHH36okJAQH48cAADUFrU67CxbtqzU5Q6HQykpKUpJSamZAQEAgDqnTl2zAwAAUFGEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwWq3+biwAqK9aPrmmSrbj9DeaebPUPmWdXPmOMvsfmjGwSvYL1Cac2QEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWC/D1AFB9Wj65Rk5/o5k3S+1T1smV7/D1kAAAqHGc2QEAAFYj7AAAAKsRdgAAgNUIOwAAwGq1OuxMnz5dXbt2VUhIiCIjIzV06FAdOHDAq8+oUaPkcDi8HrfccouPRgwAAGqbWh12Nm3apLFjx2rHjh1KS0vTpUuX1KdPH50/f96rX79+/ZSdne15fPDBBz4aMQAAqG1q9a3na9eu9Xq+aNEiRUZGKjMzU7fddpun3el0Kjo6uqaHBwAA6oBaHXaulJOTI0kKDw/3as/IyFBkZKSaNGmihIQEvfDCC4qMjCxxOy6XSy6Xy/M8NzdXkuR2u+V2u6th5L7h9Ddy+pkf/v7//qxvqL9+1y8xBxWt36ZjYKHCmmysrTxsrr+8NTmMMXXiCGCM0R133KEzZ85oy5Ytnvbly5ercePGio+PV1ZWlqZMmaJLly4pMzNTTqez2G2lpKRo2rRpRdpTU1MVHBxcbTUAAICqk5eXpxEjRignJ0ehoaEl9qszYWfs2LFas2aNtm7dqubNm5fYLzs7W/Hx8Vq2bJmGDRtWbJ/izuzExcXp5MmTpU5WXdM+ZZ2cfka/71KgKbv85Cqof5+gTP31u36JOaho/ftS+tbAqGqW2+1WWlqakpOTFRgY6Ovh1Dib68/NzVVERESZYadOvI01fvx4rV69Wps3by416EhSTEyM4uPjdfDgwRL7OJ3OYs/6BAYGWvWLcPnXQ7gKHPX66yKov37XLzEH5a3fpmPglWw7xleUjfWXt55aHXaMMRo/frxWrlypjIwMtWrVqsx1Tp06pa+//loxMTE1MEIAsEvLJ9f4ZL+HZgz0yX5RP9TqW8/Hjh2rpUuXKjU1VSEhITp+/LiOHz+uCxcuSJLOnTunyZMna/v27Tp06JAyMjI0ePBgRURE6M477/Tx6AEAQG1Qq8/szJ8/X5KUmJjo1b5o0SKNGjVK/v7+2rt3r9544w199913iomJUVJSkpYvX66QkBAfjBgAANQ2tTrslHXtdMOGDbVu3boaGg0AAKiLavXbWAAAAFeLsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQAAYDXCDgAAsBphBwAAWI2wAwAArEbYAQAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDsAAMBqhB0AAGC1AF8PwHYtn1zj6yEAAFCvcWYHAABYjTM7AACfq86z4E5/o5k3S+1T1smV7/BadmjGwGrbL2oPzuwAAACrEXYAAIDVCDsAAMBqhB0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFaz5kMF582bp5deeknZ2dlq166dZs+erVtvvdXXwwIAoIia/Cqhyz9U8cALg2psv7WJFWd2li9frgkTJuiZZ57RJ598oltvvVX9+/fXkSNHfD00AADgY1ac2Zk1a5ZGjx6thx9+WJI0e/ZsrVu3TvPnz9f06dN9PDoAQG1V376s2Vf1+vprOep82Ll48aIyMzP15JNPerX36dNH27ZtK3Ydl8sll8vleZ6TkyNJOn36tNxud5WOL+DS+SrdXoX3X2CUl1egALef8gscZa9gGeqv3/VLzEF9r19iDmpD/adOnaqW7Z49e1aSZIwpvaOp444ePWokmY8++sir/YUXXjBt2rQpdp2pU6caSTx48ODBgwcPCx5ff/11qVmhzp/ZKeRweKdVY0yRtkJPPfWUJk6c6HleUFCg06dPq1mzZiWuU1fl5uYqLi5OX3/9tUJDQ309nBpH/fW7fok5qO/1S8yBzfUbY3T27FnFxsaW2q/Oh52IiAj5+/vr+PHjXu0nTpxQVFRUses4nU45nU6vtiZNmlTXEGuF0NBQ637JK4L663f9EnNQ3+uXmANb6w8LCyuzT52/GysoKEidO3dWWlqaV3taWpq6d+/uo1EBAIDaos6f2ZGkiRMn6oEHHlCXLl30X//1X/rLX/6iI0eO6L//+799PTQAAOBjVoSd4cOH69SpU3ruueeUnZ2t9u3b64MPPlB8fLyvh+ZzTqdTU6dOLfK2XX1B/fW7fok5qO/1S8xBfa9fkhzGlHW/FgAAQN1V56/ZAQAAKA1hBwAAWI2wAwAArEbYAQAAViPs1EGbN2/W4MGDFRsbK4fDoVWrVhXp89lnn2nIkCEKCwtTSEiIbrnlFq9vgXe5XBo/frwiIiLUqFEjDRkyRN98800NVlF5ZdV/7tw5jRs3Ts2bN1fDhg31k5/8RPPnz/fqU5frnz59urp27aqQkBBFRkZq6NChOnDggFcfY4xSUlIUGxurhg0bKjExUfv37/fqY/McuN1u/fa3v1WHDh3UqFEjxcbG6sEHH9SxY8e8tlNX56A8vwOXGzNmjBwOh2bPnu3Vbnv9Nh8HyzMHth8LK4KwUwedP39eHTt21Ny5c4td/uWXX6pnz5768Y9/rIyMDP3zn//UlClT1KBBA0+fCRMmaOXKlVq2bJm2bt2qc+fOadCgQcrPz6+pMiqtrPofe+wxrV27VkuXLtVnn32mxx57TOPHj9d7773n6VOX69+0aZPGjh2rHTt2KC0tTZcuXVKfPn10/vz/fenszJkzNWvWLM2dO1c7d+5UdHS0kpOTPV+aJ9k9B3l5edq9e7emTJmi3bt3a8WKFfr88881ZMgQr+3U1Tkoz+9AoVWrVukf//hHsR+nb3P9th8HyzMHth8LK6QqvowTviPJrFy50qtt+PDh5v777y9xne+++84EBgaaZcuWedqOHj1q/Pz8zNq1a6trqNWiuPrbtWtnnnvuOa+2Tp06md/97nfGGLvqN8aYEydOGElm06ZNxhhjCgoKTHR0tJkxY4anz/fff2/CwsLM66+/boyxfw6K8/HHHxtJ5vDhw8YYu+agpPq/+eYb86Mf/cjs27fPxMfHm1deecWzzPb669Nx0Jji56C+HQtLw5kdyxQUFGjNmjVq06aN+vbtq8jISHXr1s3rrZ7MzEy53W716dPH0xYbG6v27dtr27ZtPhh11erZs6dWr16to0ePyhij9PR0ff755+rbt68k++rPycmRJIWHh0uSsrKydPz4ca/6nE6nEhISPPXZPgcl9XE4HJ7vwbNpDoqrv6CgQA888IAef/xxtWvXrsg6NtdfH4+Dxf0O1LdjYWkIO5Y5ceKEzp07pxkzZqhfv3768MMPdeedd2rYsGHatGmTJOn48eMKCgpS06ZNvdaNiooq8oWqddFrr72mG264Qc2bN1dQUJD69eunefPmqWfPnpLsqt8Yo4kTJ6pnz55q3769JHlquPKLcC+vz/Y5uNL333+vJ598UiNGjPB8EaItc1BS/X/4wx8UEBCgX//618WuZ3P99e04WNLvQH06FpbFiq+LwP8pKCiQJN1xxx167LHHJEk33nijtm3bptdff10JCQklrmuMkcPhqJFxVqfXXntNO3bs0OrVqxUfH6/Nmzfr0UcfVUxMjHr37l3ienWx/nHjxunTTz/V1q1biyy7spby1GfbHEg/XKx87733qqCgQPPmzStze3VtDoqrPzMzU6+++qp2795d4VpsqL++HQdLeg3Up2NhWTizY5mIiAgFBATohhtu8Gr/yU9+4rkLITo6WhcvXtSZM2e8+pw4caLI2YC65sKFC3r66ac1a9YsDR48WD/96U81btw4DR8+XH/84x8l2VP/+PHjtXr1aqWnp6t58+ae9ujoaEkq8j+zy+uzfQ4Kud1u3XPPPcrKylJaWprnrI5kxxyUVP+WLVt04sQJtWjRQgEBAQoICNDhw4c1adIktWzZUpLd9den42BJc1CfjoXlQdixTFBQkLp27VrkFsTPP//c88WonTt3VmBgoNLS0jzLs7OztW/fPnXv3r1Gx1vV3G633G63/Py8f7X9/f09/9ur6/UbYzRu3DitWLFCGzduVKtWrbyWt2rVStHR0V71Xbx4UZs2bfLUZ/scSP8XdA4ePKj169erWbNmXsvr8hyUVf8DDzygTz/9VHv27PE8YmNj9fjjj2vdunWS7K6/PhwHy5qD+nAsrJCavR4aVeHs2bPmk08+MZ988omRZGbNmmU++eQTz10mK1asMIGBgeYvf/mLOXjwoJkzZ47x9/c3W7Zs8Wzjv//7v03z5s3N+vXrze7du83tt99uOnbsaC5duuSrssqtrPoTEhJMu3btTHp6uvnqq6/MokWLTIMGDcy8efM826jL9f/qV78yYWFhJiMjw2RnZ3seeXl5nj4zZswwYWFhZsWKFWbv3r3mvvvuMzExMSY3N9fTx+Y5cLvdZsiQIaZ58+Zmz549Xn1cLpdnO3V1DsrzO3ClK+/GMsbu+m0/DpZnDmw/FlYEYacOSk9PN5KKPEaOHOnps2DBAnPdddeZBg0amI4dO5pVq1Z5bePChQtm3LhxJjw83DRs2NAMGjTIHDlypIYrqZyy6s/OzjajRo0ysbGxpkGDBqZt27bm5ZdfNgUFBZ5t1OX6i6tdklm0aJGnT0FBgZk6daqJjo42TqfT3HbbbWbv3r1e27F5DrKyskrsk56e7tlOXZ2D8vwOXKm4sGN7/TYfB8szB7YfCyvCYYwxVX++CAAAoHbgmh0AAGA1wg4AALAaYQcAAFiNsAMAAKxG2AEAAFYj7AAAAKsRdgAAgNUIOwAAwGqEHQBWWrx4sZo0aVKhdUaNGqWhQ4dWy3gA+A5hB4DPvf766woJCdGlS5c8befOnVNgYKBuvfVWr75btmyRw+HQ559/Xuo2hw8fXmafymjZsqVmz55d5dsFUH0IOwB8LikpSefOndOuXbs8bVu2bFF0dLR27typvLw8T3tGRoZiY2PVpk2bUrfZsGFDRUZGVtuYAdQdhB0APte2bVvFxsYqIyPD05aRkaE77rhDrVu31rZt27zak5KSdPHiRT3xxBP60Y9+pEaNGqlbt25e6xf3Ntbzzz+vyMhIhYSE6OGHH9aTTz6pG2+8sch4/vjHPyomJkbNmjXT2LFj5Xa7JUmJiYk6fPiwHnvsMTkcDjkcjqqcBgDVhLADoFZITExUenq653l6eroSExOVkJDgab948aK2b9+upKQkPfTQQ/roo4+0bNkyffrpp/rZz36mfv366eDBg8Vu/6233tILL7ygP/zhD8rMzFSLFi00f/78Iv3S09P15ZdfKj09XUuWLNHixYu1ePFiSdKKFSvUvHlzPffcc8rOzlZ2dnbVTwSAKkfYAVArJCYm6qOPPtKlS5d09uxZffLJJ7rtttuUkJDgOWOzY8cOXbhwQYmJiXr77bf1zjvv6NZbb1Xr1q01efJk9ezZU4sWLSp2+3PmzNHo0aP10EMPqU2bNnr22WfVoUOHIv2aNm2quXPn6sc//rEGDRqkgQMHasOGDZKk8PBw+fv7KyQkRNHR0YqOjq62+QBQdQg7AGqFpKQknT9/Xjt37tSWLVvUpk0bRUZGKiEhQTt37tT58+eVkZGhFi1aaPfu3TLGqE2bNmrcuLHnsWnTJn355ZfFbv/AgQO6+eabvdqufC5J7dq1k7+/v+d5TEyMTpw4UbXFAqhRAb4eAABI0nXXXafmzZsrPT1dZ86cUUJCgiQpOjparVq10kcffaT09HTdfvvtKigokL+/vzIzM72CiSQ1bty4xH1ceY2NMaZIn8DAwCLrFBQUVLYsALUAZ3YA1BpJSUnKyMhQRkaGEhMTPe0JCQlat26dduzYoaSkJN10003Kz8/XiRMndN1113k9SnprqW3btvr444+92i6/+6u8goKClJ+fX+H1APgOYQdArZGUlKStW7dqz549njM70g9h569//au+//57JSUlqU2bNvr5z3+uBx98UCtWrFBWVpZ27typP/zhD/rggw+K3fb48eO1YMECLVmyRAcPHtTzzz+vTz/9tMJ3VLVs2VKbN2/W0aNHdfLkyauqF0DNIOwAqDWSkpJ04cIFXXfddYqKivK0JyQk6OzZs2rdurXi4uIkSYsWLdKDDz6oSZMmqW3bthoyZIj+8Y9/eJZf6ec//7meeuopTZ48WZ06dVJWVpZGjRqlBg0aVGiMzz33nA4dOqTWrVvrmmuuqXyxAGqMwxT3pjUA1APJycmKjo7Wm2++6euhAKhGXKAMoF7Iy8vT66+/rr59+8rf319vv/221q9fr7S0NF8PDUA148wOgHrhwoULGjx4sHbv3i2Xy6W2bdvqd7/7nYYNG+broQGoZoQdAABgNS5QBgAAViPsAAAAqxF2AACA1Qg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACs9v8D3R0KHIuDgGgAAAAASUVORK5CYII=",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df['Weight'].hist(bins=15)\n",
"plt.suptitle('Weight distribution of MLB Players')\n",
"plt.xlabel('Weight')\n",
"plt.ylabel('Count')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Normal Distribution\n",
"\n",
"Let's create an artificial sample of weights that follows normal distribution with the same mean and variance as real data:"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([187.05660174, 181.77292853, 183.09148457, 198.30703945,\n",
" 201.51640234, 213.21564624, 221.00562653, 218.30263433,\n",
" 234.16968198, 187.40138853, 199.34286071, 205.52705493,\n",
" 251.03651986, 189.64156046, 222.23536452, 211.37502445,\n",
" 205.07287496, 207.90248813, 180.66579133, 226.86092236])"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"generated = np.random.normal(mean,std,1000)\n",
"generated[:20]"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.hist(np.random.normal(0,1,50000),bins=300)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since most values in real life are normally distributed, it means we should not use uniform random number generator to generate sample data. Here is what happens if we try to generate weights with uniform distribution (generated by `np.random.rand`):"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"wrong_sample = np.random.rand(1000)*2*std+mean-std\n",
"plt.hist(wrong_sample)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Confidence Intervals\n",
"\n",
"Let's now calculate confidence intervals for the weights and heights of baseball players. We will use the code [from this stackoverflow discussion](https://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data):"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p=0.85, mean = 201.73±0.94\n",
"p=0.90, mean = 201.73±1.08\n",
"p=0.95, mean = 201.73±1.28\n"
]
}
],
"source": [
"import scipy.stats\n",
"\n",
"def mean_confidence_interval(data, confidence=0.95):\n",
" a = 1.0 * np.array(data)\n",
" n = len(a)\n",
" m, se = np.mean(a), scipy.stats.sem(a)\n",
" h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)\n",
" return m, h\n",
"\n",
"for p in [0.85, 0.9, 0.95]:\n",
" m, h = mean_confidence_interval(df['Weight'].fillna(method='pad'),p)\n",
" print(f\"p={p:.2f}, mean = {m:.2f}±{h:.2f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hypothesis Testing\n",
"\n",
"Let's explore different roles in our baseball players dataset:"
]
},
{
"cell_type": "code",
"execution_count": 175,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Height
\n",
"
Weight
\n",
"
Count
\n",
"
\n",
"
\n",
"
Role
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Catcher
\n",
"
72.723684
\n",
"
204.328947
\n",
"
76
\n",
"
\n",
"
\n",
"
Designated_Hitter
\n",
"
74.222222
\n",
"
220.888889
\n",
"
18
\n",
"
\n",
"
\n",
"
First_Baseman
\n",
"
74.000000
\n",
"
213.109091
\n",
"
55
\n",
"
\n",
"
\n",
"
Outfielder
\n",
"
73.010309
\n",
"
199.113402
\n",
"
194
\n",
"
\n",
"
\n",
"
Relief_Pitcher
\n",
"
74.374603
\n",
"
203.517460
\n",
"
315
\n",
"
\n",
"
\n",
"
Second_Baseman
\n",
"
71.362069
\n",
"
184.344828
\n",
"
58
\n",
"
\n",
"
\n",
"
Shortstop
\n",
"
71.903846
\n",
"
182.923077
\n",
"
52
\n",
"
\n",
"
\n",
"
Starting_Pitcher
\n",
"
74.719457
\n",
"
205.163636
\n",
"
221
\n",
"
\n",
"
\n",
"
Third_Baseman
\n",
"
73.044444
\n",
"
200.955556
\n",
"
45
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Height Weight Count\n",
"Role \n",
"Catcher 72.723684 204.328947 76\n",
"Designated_Hitter 74.222222 220.888889 18\n",
"First_Baseman 74.000000 213.109091 55\n",
"Outfielder 73.010309 199.113402 194\n",
"Relief_Pitcher 74.374603 203.517460 315\n",
"Second_Baseman 71.362069 184.344828 58\n",
"Shortstop 71.903846 182.923077 52\n",
"Starting_Pitcher 74.719457 205.163636 221\n",
"Third_Baseman 73.044444 200.955556 45"
]
},
"execution_count": 175,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Role').agg({ 'Height' : 'mean', 'Weight' : 'mean', 'Age' : 'count'}).rename(columns={ 'Age' : 'Count'})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's test the hypothesis that First Basemen are higher then Second Basemen. The simplest way to do it is to test the confidence intervals:"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Conf=0.85, 1st basemen height: 73.62..74.38, 2nd basemen height: 71.04..71.69\n",
"Conf=0.90, 1st basemen height: 73.56..74.44, 2nd basemen height: 70.99..71.73\n",
"Conf=0.95, 1st basemen height: 73.47..74.53, 2nd basemen height: 70.92..71.81\n"
]
}
],
"source": [
"for p in [0.85,0.9,0.95]:\n",
" m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Height']],p)\n",
" m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Height']],p)\n",
" print(f'Conf={p:.2f}, 1st basemen height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen height: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that intervals do not overlap.\n",
"\n",
"More statistically correct way to prove the hypothesis is to use **Student t-test**:"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"T-value = 7.65\n",
"P-value: 9.137321189738925e-12\n"
]
}
],
"source": [
"from scipy.stats import ttest_ind\n",
"\n",
"tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Second_Baseman',['Height']],equal_var=False)\n",
"print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two values returned by the `ttest_ind` functions are:\n",
"* p-value can be considered as the probability of two distributions having the same mean. In our case, it is very low, meaning that there is strong evidence supporting that first basemen are taller\n",
"* t-value is the intermediate value of normalized mean difference that is used in t-test, and it is compared against threshold value for a given confidence value "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simulating Normal Distribution with Central Limit Theorem\n",
"\n",
"Pseudo-random generator in Python is designed to give us uniform distribution. If we want to create a generator for normal distribution, we can use central limit theorem. To get a normally distributed value we will just compute a mean of a uniform-generated sample."
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"image/svg+xml": "\r\n\r\n\r\n",
"text/plain": [
"