You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/1-Introduction/04-stats-and-probability/notebook.ipynb

1099 lines
1004 KiB

{
"cells": [
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"# Introduction to Probability and Statistics\n",
"|\n",
"In this notebook, we will play around with some of the concepts we have previously discussed. Many concepts from probability and statistics are well-represented in major libraries for data processing in Python, such as `numpy` and `pandas`."
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 212,
3 years ago
"metadata": {},
"outputs": [],
"source": [
3 years ago
"import numpy as np\n",
"import pandas as pd\n",
"import random\n",
"import matplotlib.pyplot as plt"
3 years ago
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"\n",
"## Random Variables and Distributions\n",
"\n",
"Let's start with drawing a sample of 30 variables from a uniform distribution from 0 to 9. We will also compute mean and variance."
]
},
{
"cell_type": "code",
"execution_count": 213,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"Sample: [1, 1, 0, 5, 6, 3, 7, 5, 1, 6, 5, 6, 7, 0, 3, 6, 2, 4, 2, 8, 1, 5, 7, 10, 8, 5, 7, 10, 6, 8]\n",
"Mean = 4.833333333333333\n",
"Variance = 7.938888888888889\n"
]
}
],
3 years ago
"source": [
"sample = [ random.randint(0,10) for _ in range(30) ]\n",
"print(f\"Sample: {sample}\")\n",
"print(f\"Mean = {np.mean(sample)}\")\n",
"print(f\"Variance = {np.var(sample)}\")"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"To visually estimate how many different values are there in the sample, we can plot the **histogram**:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 214,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAU30lEQVR4nO3df6yVBf3A8c8V84B27y0ohDsuikWhIGZghZpSKhsxpmv90NRY1h82NIhVgLYpLrlky9WisOuarRXBWqE0k0U/5OoaCXeSDJ0/JuktfzDL3YM0jxOe7x/Nu+4XUM/lc+7hHF+v7fzxPPc59/nsmfK895zn3KelKIoiAAASHFPvAQCA5iEsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0xw73Dg8cOBDPPPNMtLa2RktLy3DvHgAYgqIoYu/evdHR0RHHHHP46xLDHhbPPPNMdHZ2DvduAYAEfX19MWHChMP+fNjDorW1NSL+O1hbW9tw7x4AGIJyuRydnZ0D5/HDGfaweO3jj7a2NmEBAA3mjW5jcPMmAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaYQFAJBGWAAAaaoKixtvvDFaWloGvcaNG1er2QCABlP1s0KmTp0af/jDHwaWR4wYkToQANC4qg6LY4891lUKAOCQqr7H4vHHH4+Ojo6YNGlSXHrppfHkk0++7vaVSiXK5fKgFwDQnKq6YvHhD384fvazn8X73ve+eP755+Nb3/pWnH322bFr164YM2bMId/T1dUVK1asSBkWOPqdvOzueo/wlvD3VfPqPQIcUktRFMVQ37xv3754z3veE9/4xjdiyZIlh9ymUqlEpVIZWC6Xy9HZ2Rn9/f3R1tY21F0DRylhMTyEBcOtXC5He3v7G56/q77H4n+dcMIJcfrpp8fjjz9+2G1KpVKUSqUj2Q0A0CCO6O9YVCqVeOSRR2L8+PFZ8wAADayqsPja174WW7Zsid27d8df//rX+NSnPhXlcjkWLFhQq/kAgAZS1Uch//jHP+Kyyy6LF154Id797nfHRz7ykdi6dWucdNJJtZoPAGggVYXFunXrajUHANAEPCsEAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANMICAEgjLACANEcUFl1dXdHS0hKLFy9OGgcAaGRDDott27ZFd3d3TJ8+PXMeAKCBDSksXnrppbj88svj9ttvj3e+853ZMwEADWpIYbFw4cKYN29eXHjhhW+4baVSiXK5POgFADSnY6t9w7p166K3tze2b9/+prbv6uqKFStWVD3YW8XJy+6u9whV+/uqefUeAd7y/NvB0aqqKxZ9fX2xaNGi+MUvfhEjR458U+9Zvnx59Pf3D7z6+vqGNCgAcPSr6opFb29v7NmzJ2bMmDGwbv/+/dHT0xOrV6+OSqUSI0aMGPSeUqkUpVIpZ1oA4KhWVVhccMEFsXPnzkHrvvCFL8SUKVNi6dKlB0UFAPDWUlVYtLa2xrRp0watO+GEE2LMmDEHrQcA3nr85U0AIE3V3wr5/+69996EMQCAZuCKBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQRlgAAGmEBQCQpqqwWLNmTUyfPj3a2tqira0tZs2aFffcc0+tZgMAGkxVYTFhwoRYtWpVbN++PbZv3x4f//jH4+KLL45du3bVaj4AoIEcW83G8+fPH7R88803x5o1a2Lr1q0xderU1MEAgMZTVVj8r/3798evfvWr2LdvX8yaNeuw21UqlahUKgPL5XJ5qLsEAI5yVYfFzp07Y9asWfHyyy/H29/+9tiwYUOcdtpph92+q6srVqxYcURDAtD4Tl52d71HqNrfV82r9wgNp+pvhbz//e+PHTt2xNatW+PLX/5yLFiwIB5++OHDbr98+fLo7+8fePX19R3RwADA0avqKxbHHXdcvPe9742IiJkzZ8a2bdvi+9//fvz4xz8+5PalUilKpdKRTQkANIQj/jsWRVEMuocCAHjrquqKxXXXXRdz586Nzs7O2Lt3b6xbty7uvffe2LRpU63mAwAaSFVh8fzzz8eVV14Zzz77bLS3t8f06dNj06ZNcdFFF9VqPgCggVQVFj/5yU9qNQcA0AQ8KwQASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASCMsAIA0wgIASFNVWHR1dcVZZ50Vra2tMXbs2Ljkkkvi0UcfrdVsAECDqSostmzZEgsXLoytW7fG5s2b49VXX405c+bEvn37ajUfANBAjq1m402bNg1avuOOO2Ls2LHR29sb5513XupgAEDjqSos/r/+/v6IiBg9evRht6lUKlGpVAaWy+XykewSADiKtRRFUQzljUVRxMUXXxwvvvhi3HfffYfd7sYbb4wVK1YctL6/vz/a2tqGsuvDOnnZ3am/D+rt76vm1XuEqvn/EOqrVv9ulMvlaG9vf8Pz95C/FXLNNdfEQw89FL/85S9fd7vly5dHf3//wKuvr2+ouwQAjnJD+ijk2muvjY0bN0ZPT09MmDDhdbctlUpRKpWGNBwA0FiqCouiKOLaa6+NDRs2xL333huTJk2q1VwAQAOqKiwWLlwYa9eujbvuuitaW1vjueeei4iI9vb2GDVqVE0GBAAaR1X3WKxZsyb6+/tj9uzZMX78+IHX+vXrazUfANBAqv4oBADgcDwrBABIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDTCAgBIIywAgDRVh0VPT0/Mnz8/Ojo6oqWlJe68884ajAUANKKqw2Lfvn1xxhlnxOrVq2sxDwDQwI6t9g1z586NuXPn1mIWAKDBVR0W1apUKlGpVAaWy+V
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 384.8825 297.190125\" width=\"384.8825pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-23T14:13:08.258866</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 384.8825 297.190125 \r\nL 384.8825 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 20.5625 273.312 \r\nL 377.6825 273.312 \r\nL 377.6825 7.2 \r\nL 20.5625 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 36.795227 273.312 \r\nL 69.260682 273.312 \r\nL 69.260682 171.936 \r\nL 36.795227 171.936 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 69.260682 273.312 \r\nL 101.726136 273.312 \r\nL 101.726136 70.56 \r\nL 69.260682 70.56 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 101.726136 273.312 \r\nL 134.191591 273.312 \r\nL 134.191591 171.936 \r\nL 101.726136 171.936 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 134.191591 273.312 \r\nL 166.657045 273.312 \r\nL 166.657045 171.936 \r\nL 134.191591 171.936 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 166.657045 273.312 \r\nL 199.1225 273.312 \r\nL 199.1225 222.624 \r\nL 166.657045 222.624 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 199.1225 273.312 \r\nL 231.587955 273.312 \r\nL 231.587955 19.872 \r\nL 199.1225 19.872 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 231.587955 273.312 \r\nL 264.053409 273.312 \r\nL 264.053409 19.872 \r\nL 231.587955 19.872 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 264.053409 273.312 \r\nL 296.518864 273.312 \r\nL 296.518864 70.56 \r\nL 264.053409 70.56 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 296.518864 273.312 \r\nL 328.984318 273.312 \r\nL 328.984318 121.248 \r\nL 296.518864 121.248 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#pc7f9d32d0e)\" d=\"M 328.984318 273.312 \r\nL 361.449773 273.312 \r\nL 361.449773 171.936 \r\nL 328.984318 171.936 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"mc14df800ab\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"36.795227\" xlink:href=\"#mc14df800ab\" y=\"273.312\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- 0 -->\r\n <g transform=\"translate(33.613977 287.910437)scale(0.
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.hist(sample)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Analyzing Real Data\n",
"\n",
"Mean and variance are very important when analyzing real-world data. Let's load the data about baseball players from [SOCR MLB Height/Weight Data](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights)"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 215,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Team</th>\n",
" <th>Role</th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>Age</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Adam_Donachie</td>\n",
" <td>BAL</td>\n",
" <td>Catcher</td>\n",
" <td>74</td>\n",
" <td>180.0</td>\n",
" <td>22.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Paul_Bako</td>\n",
" <td>BAL</td>\n",
" <td>Catcher</td>\n",
" <td>74</td>\n",
" <td>215.0</td>\n",
" <td>34.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Ramon_Hernandez</td>\n",
" <td>BAL</td>\n",
" <td>Catcher</td>\n",
" <td>72</td>\n",
" <td>210.0</td>\n",
" <td>30.78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Kevin_Millar</td>\n",
" <td>BAL</td>\n",
" <td>First_Baseman</td>\n",
" <td>72</td>\n",
" <td>210.0</td>\n",
" <td>35.43</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Chris_Gomez</td>\n",
" <td>BAL</td>\n",
" <td>First_Baseman</td>\n",
" <td>73</td>\n",
" <td>188.0</td>\n",
" <td>35.71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1029</th>\n",
" <td>Brad_Thompson</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>73</td>\n",
" <td>190.0</td>\n",
" <td>25.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1030</th>\n",
" <td>Tyler_Johnson</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>74</td>\n",
" <td>180.0</td>\n",
" <td>25.73</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1031</th>\n",
" <td>Chris_Narveson</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>75</td>\n",
" <td>205.0</td>\n",
" <td>25.19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1032</th>\n",
" <td>Randy_Keisler</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>75</td>\n",
" <td>190.0</td>\n",
" <td>31.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1033</th>\n",
" <td>Josh_Kinney</td>\n",
" <td>STL</td>\n",
" <td>Relief_Pitcher</td>\n",
" <td>73</td>\n",
" <td>195.0</td>\n",
" <td>27.92</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1034 rows × 6 columns</p>\n",
"</div>"
3 years ago
],
"text/plain": [
" Name Team Role Height Weight Age\n",
"0 Adam_Donachie BAL Catcher 74 180.0 22.99\n",
"1 Paul_Bako BAL Catcher 74 215.0 34.69\n",
"2 Ramon_Hernandez BAL Catcher 72 210.0 30.78\n",
"3 Kevin_Millar BAL First_Baseman 72 210.0 35.43\n",
"4 Chris_Gomez BAL First_Baseman 73 188.0 35.71\n",
"... ... ... ... ... ... ...\n",
"1029 Brad_Thompson STL Relief_Pitcher 73 190.0 25.08\n",
"1030 Tyler_Johnson STL Relief_Pitcher 74 180.0 25.73\n",
"1031 Chris_Narveson STL Relief_Pitcher 75 205.0 25.19\n",
"1032 Randy_Keisler STL Relief_Pitcher 75 190.0 31.01\n",
"1033 Josh_Kinney STL Relief_Pitcher 73 195.0 27.92\n",
"\n",
"[1034 rows x 6 columns]"
]
},
3 years ago
"execution_count": 215,
"metadata": {},
3 years ago
"output_type": "execute_result"
}
],
3 years ago
"source": [
"df = pd.read_csv(\"../../data/SOCR_MLB.tsv\",sep='\\t',header=None,names=['Name','Team','Role','Height','Weight','Age'])\n",
"df"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"> We are using a package called **Pandas** here for data analysis. We will talk more about Pandas and working with data in Python later in this course.\n",
"\n",
"Let's compute average values for age, height and weight:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 216,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Age 28.736712\n",
"Height 73.697292\n",
"Weight 201.689255\n",
"dtype: float64"
]
},
3 years ago
"execution_count": 216,
"metadata": {},
3 years ago
"output_type": "execute_result"
}
],
3 years ago
"source": [
"df[['Age','Height','Weight']].mean()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"Now let's focus on height, and compute standard deviation and variance: "
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 44,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"[180.0, 215.0, 210.0, 210.0, 188.0, 176.0, 209.0, 200.0, 231.0, 180.0, 188.0, 180.0, 185.0, 160.0, 180.0, 185.0, 197.0, 189.0, 185.0, 219.0]\n"
]
}
],
3 years ago
"source": [
"print(list(df['Height'])[:20])"
]
},
{
"cell_type": "code",
"execution_count": 218,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"Mean = 73.6972920696325\n",
"Variance = 5.316798081118081\n",
"Standard Deviation = 2.305818310517566\n"
]
}
],
3 years ago
"source": [
"mean = df['Height'].mean()\n",
"var = df['Height'].var()\n",
"std = df['Height'].std()\n",
"print(f\"Mean = {mean}\\nVariance = {var}\\nStandard Deviation = {std}\")"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"In addition to mean, it makes sense to look at median value and quartiles. They can be visualized using **box plot**:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 217,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAy0AAADFCAYAAABZ7x10AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAlCUlEQVR4nO3de3RV5Z3/8c+5JEEwwE9YLgwBRqGtFhkVZXTUepkqVP2Nsux1ZhRdXkYHrS22oDPjGvBSZLXViNgYEZZOG+soLVKGdkRsQRFwpEQcQZGr5Y6L8WcSRM5tf39/MMmYEG4m373PPr5fa2UlJseV57w5Z+88eZ59kjAzEwAAAAAUqWTUAwAAAACAQ2HSAgAAAKCoMWkBAAAAUNSYtAAAAAAoakxaAAAAABQ1Ji0AAAAAihqTFgAAAABFLR32NwyCQNu3b1dlZaUSiUTY3x4AAABAkTAzNTc3q6qqSsnkwddTQp+0bN++XQMGDAj72wIAAAAoUlu2bFF1dfVBvx76pKWyslLS/oH17Nkz7G8fa4VCQW+//baGDRumVCoV9XBKDn190dcXfX3R1xd9fdHXF307p6mpSQMGDGidIxxMwswspDFJ2j+wXr16qbGxkUkLAAAA8Dl2pHMDLsSPkWw2q5kzZyqbzUY9lJJEX1/09UVfX/T1RV9f9PVF33AwaYmRVCqlc845h6VHJ/T1RV9f9PVFX1/09UVfX/QNB9vDAAAAAESC7WElKJvNqra2luVHJ/T1RV9f9PVFX1/09UVfX/QNB5OWGEmn0xo5cqTS6dBf9O1zgb6+6OuLvr7o64u+vujri77hYHsYAAAAgEiwPawEZTIZPfzww8pkMlEPpSTR1xd9fdHXF3190dcXfX3RNxystMRIEATatm2b+vfvr2SS+WZXo68v+vqiry/6+qKvL/r6om/nHOncgEkLAAAAgEiwPawEZTIZPfjggyw/OqGvL/r6oq8v+vqiry/6+qJvOFhpiZEgCLR792717duX5UcH9PVFX1/09UVfX/T1RV9f9O0ctocBAAAAKGpsDytBmUxG9957L8uPTujri76+6OuLvr7o64u+vugbDlZaYsTM1NzcrMrKSiUSiaiHU3Lo64u+vujri76+6OuLvr7o2zmstJSoioqKqIdQ0ujri76+6OuLvr7o64u+vujrj0lLjGSzWU2ZMkXZbDbqoZQk+vqiry/6+qKvL/r6oq8v+oaD7WExYmbKZrMqLy9n+dEBfX3R1xd9fdHXF3190dcXfTuH7WEliou8fNHXF3190dcXfX3R1xd9fdHXH5OWGMlms6qpqWH50Ql9fdHXF3190dcXfX3R1xd9w8H2MAAAAACRYHtYCQqCQB988IGCIIh6KCWJvr7o64u+vujri76+6OuLvuFg0hIjuVxOM2fOVC6Xi3ooJYm+vujri76+6OuLvr7o64u+4WB7GAAAAIBIsD2sBAVBoC1btrD86IS+vujri76+6OuLvr7o64u+4WDSEiO5XE6zZs1i+dEJfX3R1xd9fdHXF3190dcXfcPB9jAAAAAAkWB7WAkKgkDr169n+dEJfX3R1xd9fdHXF3190dcXfcPBpCVG8vm8XnrpJeXz+aiHUpLo64u+vujri76+6OuLvr7oGw62hwEAAACIBNvDSlChUNDq1atVKBSiHkpJoq8v+vqiry/6+qKvL/r6om84mLTESKFQ0Ouvv86Twgl9fdHXF3190dcXfX3R1xd9w8H2MAAAAACRYHtYCSoUCmpoaGAm74S+vujri76+6OuLvr7o64u+4WDSEiOFQkHvvPMOTwon9PVFX1/09UVfX/T1RV9f9A0H28MAAAAARILtYSUon89r2bJlvA64E/r6oq8v+vqiry/6+qKvL/qGg0lLjJiZtm7dqpAXxz436OuLvr7o64u+vujri76+6BsOtocBAAAAiATbw0pQPp/XokWLWH50Ql9f9PVFX1/09UVfX/T1Rd9wMGmJETNTU1MTy49O6OuLvr7o64u+vujri76+6BsOtocBAAAAiATbw0pQPp/X/PnzWX50Ql9f9PVFX1/09UVfX/T1Rd9wMGkBAAAAUNTYHgYAAAAgEmwPK0G5XE5z585VLpeLeiglib6+6OuLvr7o64u+vujri77hYNISI4lEQj179lQikYh6KCWJvr7o64u+vujri76+6OuLvuFgexgAAACASLA9rATlcjnNmjWL5Ucn9PVFX1/09UVfX/T1RV9f9A0Hk5YYSSQSqq6uZvnRCX190dcXfX3R1xd9fdHXF33DwfYwAAAAAJFge1gJymazqq+vVzabjXooJYm+vujri76+6OuLvr7o64u+4WDSEiOpVEpf/vKXlUqloh5KSaKvL/r6oq8v+vqiry/6+qJvONgeBgAAACASbA8rQdlsVjNnzmT50Ql9fdHXF3190dcXfX3R1xd9w8GkJUZSqZTOOecclh+d0NcXfX3R1xd9fdHXF3190TccbA8DAAAAEAm2h5WgbDar2tpalh+d0NcXfX3R1xd9fdHXF3190TccTFpiJJ1Oa+TIkUqn01EPpSTR1xd9fdHXF3190dcXfX3RNxxsDwMAAAAQCbaHlaBMJqOHH35YmUwm6qGUJPr6oq8v+vqiry/6+qKvL/qGg5WWGAmCQNu2bVP//v2VTDLf7Gr09UVfX/T1RV9f9PVFX1/07ZwjnRswaQEAAAAQiSOdG3DFUIy0LD/eeeedqqioiHo4JYe+vjKZjP7pn/5J3/zmN1VeXh71cEpO/pMm/eH5J/RX37pF6WPi8QuhyspKfeELX4h6GEeE44Mv+vqiry/6hoOVlhgJgkC7d+9W3759WX50QF9f7733nk4++eSoh1GyzuiXVMMtx2r4E3v05s4g6uEcsbVr18Zi4sLxwRd9fdHXF307h5WWEpRMJnX88cdHPYySRV9fH3/8sSSpvr5ep5xySsSjKT3HfLRWevUWPfPMM/qk9xejHs5hvfvuu7rmmmvU3Nwc9VCOCMcHX/T1RV9f9A0Hk5YYyWQymjJliu6++26WHx3Q11fLH90aPHiwhg8fHvFoSk/2TwXp1f19ywfRt6txfPBFX1/09UXfcLCGFSPl5eUaN24c1wM4oa+vsrKyNu/RtcrK0m3eo2txfPBFX1/09UXfcDBpiRlm8L7oC+BgOD74oq8v+vqirz8mLTHy0Ucf6Y477tBHH30U9VBKUjab1ZQpU1q3MaFr5XK5Nu/RtXK5fJv36FocH3zFre/evXvV0NCgvXv3Rj2UIxK3vnFD33B8bict2WxWjzzyiL773e/qkUceicUDbePGjZo+fbo2btwY9VBKUnl5ue6++26Wd52wPcwX28N8cXzwFbe+a9as0Zlnnqk1a9ZEPZTDamxs1Fe/+lXNmDFDX/3qV9XY2Bj1kA4pbj+fZbNZ1dbWaufOnaqtrS368RYKBS1atEjPPvusFi1apEKhEPWQjtjnctIyYcIE9ejRQ+PGjdNjjz2mcePGqUePHpowYULUQ0PEMplM1EMAtGz7Ml015yot274s6qHgUzg++KJv1xsyZIh69+6tJUuWaOvWrVqyZIl69+6tIUOGRD20DsXt57OW8d55552qq6vTnXfeWdTjnT17toYMGaKLL75Yf/u3f6uLL75YQ4YM0ezZs6Me2hE56knLq6++qr/+679WVVWVEomE5syZ4zAsPxMmTNBPfvIT9enTR08++aR27NihJ598Un369NFPfvKTon2gSWyv8ZbNZlVTU1P0vyWJKx6/R8bMNLVhqjY2btTUhqk60j+lxfYwXxwffNG36w0ZMkQbNmyQJI0aNUo33HCDRo0aJUnasGFD0U1c4vbz2afHW1tbqx/84Aeqra0t2vHOnj1b3/jGNzRs2DAtW7ZMzc3NWrZsmYYNG6ZvfOMbsZi4HPUfl/yP//gPLVmyRMOHD9fXv/51vfDCCxo9evQR//9R/nHJbDarHj16qE+fPtq6davS6f/dRpHP51VdXa3//u//1scff1yUS9QNDQ0688wztWLFCl4yFrHD4/fILNm2RLe+fGvrf9ddUqfz+p93+P9x+0pp+oXS378iVZ3uNr6
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"141.958125pt\" version=\"1.1\" viewBox=\"0 0 585.7625 141.958125\" width=\"585.7625pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-23T14:13:08.586145</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 141.958125 \r\nL 585.7625 141.958125 \r\nL 585.7625 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 20.5625 118.08 \r\nL 578.5625 118.08 \r\nL 578.5625 7.2 \r\nL 20.5625 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <path clip-path=\"url(#pe457060245)\" d=\"M 77.630682 118.08 \r\nL 77.630682 7.2 \r\n\" style=\"fill:none;stroke:#808080;stroke-dasharray:0.8,1.32;stroke-dashoffset:0;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_2\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"md3f076e693\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"77.630682\" xlink:href=\"#md3f076e693\" y=\"118.08\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- 68 -->\r\n <g transform=\"translate(71.268182 132.678438)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 2113 2584 \r\nQ 1688 2584 1439 2293 \r\nQ 1191 2003 1191 1497 \r\nQ 1191 994 1439 701 \r\nQ 1688 409 2113 409 \r\nQ 2538 409 2786 701 \r\nQ 3034 994 3034 1497 \r\nQ 3034 2003 2786 2293 \r\nQ 2538 2584 2113 2584 \r\nz\r\nM 3366 4563 \r\nL 3366 3988 \r\nQ 3128 4100 2886 4159 \r\nQ 2644 4219 2406 4219 \r\nQ 1781 4219 1451 3797 \r\nQ 1122 3375 1075 2522 \r\nQ 1259 2794 1537 2939 \r\nQ 1816 3084 2150 3084 \r\nQ 2853 3084 3261 2657 \r\nQ 3669 2231 3669 1497 \r\nQ 3669 778 3244 343 \r\nQ 2819 -91 2113 -91 \r\nQ 1303 -91 875 529 \r\nQ 447 1150 447 2328 \r\nQ 447 3434 972 4092 \r\nQ 1497 4750 2381 4750 \r\nQ 2619 4750 2861 4703 \r\nQ 3103 4656 3366 4563 \r\nz\r\n\" id=\"DejaVuSans-36\" transform=\"scale(0.015625)\"/>\r\n <path d=\"M 2034 2216 \r\nQ 1584 2216 1326 1975 \r\nQ 1069 1734 1069 1313 \r\nQ 1069 891 1326 650 \r\nQ 1584 409 2034 409 \r\nQ 2484 409 2743 651 \r\nQ 3003 894 3003 1313 \r\nQ 3003 1734 2745 1975 \r\nQ 2488 2216 2034 2216 \r\nz\r\nM 1403 2484 \r\nQ 997 2584 770 2862 \r\nQ 544 3141 544 3541 \r\nQ 544 4100 942 4425 \r\nQ 1341 4750 2034 4750 \r\nQ 2731 4750 3128 4425 \r\nQ 3525 4100 3525 3541 \r\nQ 3525 3141 3298 2862 \r\nQ 3072 2584 2669 2484 \r\nQ 3125 2378 3379 2068 \r\nQ 3634 1759 3634 1313 \r\nQ 3634 634 3220 271 \r\nQ 2806 -91 2034 -91 \r\nQ 1263 -91 848 271 \r\nQ 434 634 434 1313 \r\nQ 434 1759 690 2068 \r\nQ 947 2378 1403 2484 \r\nz\r\nM 1172 3481 \r\nQ 1172 3119 1398 2916 \r\nQ 1625 2713 2034 2713 \r\nQ 2441 2713 2670 2916 \r\nQ 2900 3119 2900 3481 \r\nQ 2900 3844 2670 4047 \r\nQ 2441 4250 2034 4250 \r\nQ 1625 4250 1398 4047 \r\nQ 1172 3844 1172 3481 \r\nz\r\n\" id=\"DejaVuSans-38\" transform=\"scale(0.015625)\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-36\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-38\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_2\">\r\n <g id=\"line2d
"text/plain": [
"<Figure size 1000x200 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.figure(figsize=(10,2))\n",
"plt.boxplot(df['Height'],vert=False,showmeans=True)\n",
"plt.grid(color='gray',linestyle='dotted')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"We can also make box plots of subsets of our dataset, for example, grouped by player role."
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 210,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAi8AAAI9CAYAAADyypjUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAACOEElEQVR4nOzdd1hT5/s/8HeAMAVEBAFFplvUuutCVHDWQe1Q625ddVRRq7YW8ePeo3VUraNWbWuRWvcAERXq3hNUcGtdKBvy/P7wR75GUAmEHE54v66LS3NGzv3knCR3nvMMhRBCgIiIiEgmjKQOgIiIiEgbTF6IiIhIVpi8EBERkawweSEiIiJZYfJCREREssLkhYiIiGSFyQsRERHJCpMXIiIikhUmL0RERCQrTF6oWFuzZg0UCoXGn4ODA5o3b45t27ZJHZ6au7s7+vTpo/V+ycnJmDRpEg4cOKDzmAxV8+bN0bx58/dup1AoMHTo0EKN5ebNmxrXppGREezs7NCyZUvs2bOnQM+5Zs0a3QZLpEdMXogArF69GtHR0Thy5Ah+/vlnGBsb46OPPsI///wjdWgFkpycjJCQECYvMjds2DBER0cjKioKc+bMwbVr19CuXTscPHhQ6tCIJGEidQBERUH16tVRt25d9eM2bdrAzs4OGzduxEcffSRhZPKSnJwMS0tLqcMwOOXLl0fDhg0BAI0bN0aFChXg6+uLVatWoVmzZhJHR6R/rHkhyoW5uTlMTU2hVCo1lj958gRDhgxB2bJlYWpqCk9PT3z33XdIS0sDAKSmpuKDDz6At7c3nj9/rt7v/v37cHJyQvPmzZGVlQUA6NOnD0qUKIELFy6gZcuWsLKygoODA4YOHYrk5OT3xpiQkIAvvvgCjo6OMDMzQ5UqVTB37lyoVCoAr24PODg4AABCQkLUtx7ed/vpwoULCAgIgKWlJRwcHPD1119j+/btUCgUGjU4zZs3R/Xq1XHw4EE0atQIlpaW6NevX55iA4ADBw7keM7suN+8raHNayWEwJIlS1CrVi1YWFjAzs4OXbt2xfXr13NsN2vWLLi5ucHc3By1a9fGzp073/u6v2n58uWoWLEizMzMULVqVWzatEmjLCYmJpg+fXqO/Q4ePAiFQoE///xT62NmJ9oPHjzQWH7+/Hl06tQJdnZ2MDc3R61atbB27do8Pee1a9fQvXt3jXP2008/aR0bkV4IomJs9erVAoCIiYkRGRkZIj09Xdy6dUsMHz5cGBkZiV27dqm3TUlJETVq1BBWVlZizpw5Ys+ePWLixInCxMREtGvXTr3d1atXhbW1tQgMDBRCCJGVlSVatGghHB0dxd27d9Xb9e7dW5iamory5cuLqVOnij179ohJkyYJExMT0aFDB4043dzcRO/evdWPHz58KMqWLSscHBzEsmXLxK5du8TQoUMFADF48GAhhBCpqali165dAoDo37+/iI6OFtHR0SI2Nvatr8fdu3eFvb29KF++vFizZo3YsWOH6Nmzp3B3dxcAREREhHpbX19fUapUKeHq6ioWL14sIiIiRGRkZJ5iE0KIiIiIHM8phBA3btwQAMTq1avz9Vp99dVXQqlUiqCgILFr1y6xYcMGUblyZVGmTBlx//599XbBwcHq12bnzp3i559/FmXLlhVOTk7C19f3ra9RNgDC1dVVVK1aVWzcuFFs3bpVtGnTRgAQf/75p3q7Ll26iPLly4vMzEyN/T/55BPh4uIiMjIy3nqM7Ndi9uzZGsvPnz8vAIhhw4apl12+fFlYW1sLLy8vsW7dOrF9+3bRrVs3AUDMnDnzna/vhQsXhK2trfDx8RHr1q0Te/bsEUFBQcLIyEhMmjTpva8Fkb4xeaFiLTt5efPPzMxMLFmyRGPbZcuWCQDijz/+0Fg+c+ZMAUDs2bNHvez3338XAMSCBQvEDz/8IIyMjDTWC/HqCxmAWLhwocbyqVOnCgDi0KFD6mVvJi/jxo0TAMS///6rse/gwYOFQqEQV65cEUII8ejRIwFABAcH5+n1GDNmjFAoFOLChQsay1u3bp1r8gJA7N+/X2PbvMambfKSl9cqOjpaABBz587V2O7WrVvCwsJCjB07VgghxNOnT4W5ubno0qWLxnaHDx8WAPKcvFhYWGgkRJmZmaJy5crC29tbvSy7nFu2bFEvu3PnjjAxMREhISHvPEb2azFz5kyRkZEhUlNTxenTp8WHH34onJ2dxY0bN9Tbfv7558LMzEwkJCRoPEfbtm2FpaWlePbsmcZzvv76tm7dWpQrV048f/5cY9+hQ4cKc3Nz8eTJk/e+HkT6xNtGRADWrVuHY8eO4dixY9i5cyd69+6Nr7/+Gj/++KN6m/DwcFhZWaFr164a+2bfhtm/f7962aefforBgwdjzJgxmDJlCiZMmAB/f/9cj92jRw+Nx927dwcAREREvDXe8PBwVK1aFfXr188RixAC4eHh7y90LiIjI1G9enVUrVpVY3m3bt1y3d7Ozg4tWrTQS2zA+1+rbdu2QaFQ4IsvvkBmZqb6z8nJCTVr1lTfooqOjkZqamqO52vUqBHc3NzyHE/Lli1RpkwZ9WNjY2N89tlniI2Nxe3btwG8ur1Ws2ZNjVswy5Ytg0KhwIABA/J0nG+//RZKpVJ9K+j8+fP4559/4O7urt4mPDwcLVu2hKurq8a+ffr0QXJyMqKjo3N97tTUVOzfvx9dunSBpaWlxuvWrl07pKamIiYmJq8vCZFeMHkhAlClShXUrVsXdevWRZs2bbB8+XIEBARg7NixePbsGQDg8ePHcHJygkKh0NjX0dERJiYmePz4scbyfv36ISMjAyYmJhg+fHiuxzUxMYG9vb3GMicnJ/Xx3ubx48dwdnbOsdzFxeW9+77L48ePNb6Ms+W2DECuMRRWbHl5rR48eAAhBMqUKQOlUqnxFxMTg//++09j++z9c3vOvHjX/q+Xc/jw4di/fz+uXLmCjIwMrFixAl27ds3zsUaMGIFjx47h0KFDmDNnDjIyMtCpUyeNY+T3dX/8+DEyMzOxePHiHK9Zu3btAED9uhEVFextRPQWNWrUwO7du3H16lXUr18f9vb2+PfffyGE0EhgHj58iMzMTJQuXVq9LCkpCT179kTFihXx4MEDfPnll/j7779zHCMzMxOPHz/W+FK+f/8+AOT4on6dvb097t27l2P53bt3AUAjFm3Y29vnaAT6ekxvejOR0yY2c3NzAFA3ds72ti/KvLxWpUuXhkKhQFRUFMzMzHI8R/ay7O1zK9f9+/c1ajTe5W37v34M4FUN0bfffouffvoJDRs2xP379/H111/n6RgAUK5cOXUj3caNG8PJyQlffPEFgoOD1bWD+b0m7OzsYGxsjJ49e741Jg8PjzzHSqQPrHkheovTp08DgLrHTsuWLfHy5UuEhYVpbLdu3Tr1+myDBg1CQkICQkNDsWrVKmzduhXz58/P9Ti//fabxuMNGzYAwDsHSmvZsiUuXryIkydP5ohFoVDAz88PwP99WaekpLyjpP/H19cX58+fx8WLFzWWv96D5n3yGlt2gnD27FmN7bZu3frW537fa9WhQwcIIXDnzh11Tdrrfz4+PgCAhg0bwtzcPMfzHTlyBPHx8Xku6/79+zWSvaysLPz+++/w8vJCuXLl1MvNzc0xYMAArF27FvPmzUOtWrXQuHHjPB/nTT169EDz5s2xYsUKdbwtW7ZEeHi4OlnJtm7dOlhaWqq7Wr/J0tISfn5+OHXqFGrUqJHr6/auRJpIEpK2uCGSWHaD3dWrV6t742zbtk3069dPANBo0Jnd28ja2lrMmzdP7N27VwQHBwulUqnR22jFihU5GkQOHTpUKJVKjUas7+pB07ZtW40439bbyMnJSfz8889i9+7dYvjw4UKhUIghQ4bk2LdSpUpi9+7d4tixYxqNPN90584djd5GO3fuFD179hRubm4CgIiMjFRv6+vrK6pVq5bjObSJrVWrVsLOzk6sWLFC7NmzR3z77beiQoUKWvU2evO1GjBggLC0tBRjxowR//zzjwgPDxe//fabGDx4sEYj7O+//17d22jXrl1ixYo
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"412.405812pt\" version=\"1.1\" viewBox=\"0 0 402.765 412.405812\" width=\"402.765pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-23T14:13:07.478625</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 412.405812 \r\nL 402.765 412.405812 \r\nL 402.765 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 26.925 294.048 \r\nL 395.565 294.048 \r\nL 395.565 34.848 \r\nL 26.925 34.848 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <path clip-path=\"url(#p3e5b0d5013)\" d=\"M 47.405 294.048 \r\nL 47.405 34.848 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_2\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"m4f402d6377\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"47.405\" xlink:href=\"#m4f402d6377\" y=\"294.048\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- Catcher -->\r\n <g transform=\"translate(50.164375 340.17925)rotate(-90)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 4122 4306 \r\nL 4122 3641 \r\nQ 3803 3938 3442 4084 \r\nQ 3081 4231 2675 4231 \r\nQ 1875 4231 1450 3742 \r\nQ 1025 3253 1025 2328 \r\nQ 1025 1406 1450 917 \r\nQ 1875 428 2675 428 \r\nQ 3081 428 3442 575 \r\nQ 3803 722 4122 1019 \r\nL 4122 359 \r\nQ 3791 134 3420 21 \r\nQ 3050 -91 2638 -91 \r\nQ 1578 -91 968 557 \r\nQ 359 1206 359 2328 \r\nQ 359 3453 968 4101 \r\nQ 1578 4750 2638 4750 \r\nQ 3056 4750 3426 4639 \r\nQ 3797 4528 4122 4306 \r\nz\r\n\" id=\"DejaVuSans-43\" transform=\"scale(0.015625)\"/>\r\n <path d=\"M 2194 1759 \r\nQ 1497 1759 1228 1600 \r\nQ 959 1441 959 1056 \r\nQ 959 750 1161 570 \r\nQ 1363 391 1709 391 \r\nQ 2188 391 2477 730 \r\nQ 2766 1069 2766 1631 \r\nL 2766 1759 \r\nL 2194 1759 \r\nz\r\nM 3341 1997 \r\nL 3341 0 \r\nL 2766 0 \r\nL 2766 531 \r\nQ 2569 213 2275 61 \r\nQ 1981 -91 1556 -91 \r\nQ 1019 -91 701 211 \r\nQ 384 513 384 1019 \r\nQ 384 1609 779 1909 \r\nQ 1175 2209 1959 2209 \r\nL 2766 2209 \r\nL 2766 2266 \r\nQ 2766 2663 2505 2880 \r\nQ 2244 3097 1772 3097 \r\nQ 1472 3097 1187 3025 \r\nQ 903 2953 641 2809 \r\nL 641 3341 \r\nQ 956 3463 1253 3523 \r\nQ 1550 3584 1831 3584 \r\nQ 2591 3584 2966 3190 \r\nQ 3341 2797 3341 1997 \r\nz\r\n\" id=\"DejaVuSans-61\" transform=\"scale(0.015625)\"/>\r\n <path d=\"M 1172 4494 \r\nL 1172 3500 \r\nL 2356 3500 \r\nL 2356 3053 \r\nL 1172 3053 \r\nL 1172 1153 \r\nQ 1172 725 1289 603 \r\nQ 1406 481 1766 481 \r\nL 2356 481 \r\nL 2356 0 \r\nL 1766 0 \r\nQ 1100 0 847 248 \r\nQ 594 497 594 1153 \r\nL 594 3053 \r\nL 172 3053 \r\nL 172 3500 \r\nL 594 3500 \r\nL 594 4494 \r\nL 1172 4494 \r\nz\r\n\" id=\"DejaVuSans-74\" transform=\"scale(0.015625)\"/>\r\n <path d=\"M 3122 3366 \r\nL 3122 2828 \r\nQ 2878 2963 2633 3030 \r\nQ 2388 3097 2138 3097 \r\nQ 1578 3097 1268 2742 \r\nQ 959 2388 959 1747 \r\nQ 959 1106 1268 751 \r\nQ 1578 397 2138 397 \r\nQ 2388 397 2633 464 \r\nQ 2878 531 3122 666 \r\nL
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"df.boxplot(column='Height',by='Role')\n",
"plt.xticks(rotation='vertical')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"> **Note**: This diagram suggests, that on average, height of first basemen is higher that height of second basemen. Later we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that. \n",
"\n",
"Age, height and weight are all continuous random variables. What do you think their distribution is? A good way to find out is to plot the histogram of values: "
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 211,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAHgCAYAAABDx6wqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAABGB0lEQVR4nO3de3xU1b3///fkNhBIAiHmVkJAFFqBolzEA2gSIdxBRCtKVbBYORVoEdB6KRKsCsWKKBRse7gpRqj+AKl4wAAJF4EKQSpQi6gBFIKUiwkQHIZk/f7wmzkMuYckk6y8no/HPGDWXnvv9VnJbN7s2XvGYYwxAgAAsJSfrwcAAABQnQg7AADAaoQdAABgNcIOAACwGmEHAABYjbADAACsRtgBAABWI+wAAACrEXYAAIDVCDuoNd599105HA4tX768yLKOHTvK4XBo3bp1RZa1bt1anTp1qtC+Ro0apZYtW1ZqnCkpKXI4HDp58mSZfV988UWtWrWqUvspdOjQITkcDi1evLjIGCoiLy9PKSkpysjIqNB6xe2rZcuWGjRoUIW2U5bU1FTNnj272GUOh0MpKSlVur+qtmHDBnXp0kWNGjWSw+Eo8ede+PMsraZf/OIXnj6XS0xMVPv27UsdR+HPq/Dh5+enmJgYDRgwQB999FG5amnZsqXXNho3bqxu3brpjTfeKDKexMTEcm0T8CXCDmqNxMREORwOpaene7WfPn1ae/fuVaNGjYos++abb/TVV18pKSmpQvuaMmWKVq5cedVjLktVhJ3iPPzww9q+fXuF1snLy9O0adMqHHYqs6/KKC3sbN++XQ8//HC1j6GyjDG65557FBgYqNWrV2v79u1KSEgodZ2QkBAtXrxYBQUFXu3nzp3TO++8o9DQ0Ksa09q1a7V9+3Zt3bpVr7zyio4fP67ExETt3r27XOv36NFD27dv1/bt27V48WI5HA6NHDlS8+fPv6pxAb4Q4OsBAIUiIiLUvn37Iv8Yb9q0SQEBARo9enSRsFP4vKJhp3Xr1lc1Vl9r3ry5mjdvXq37yMvLU3BwcI3sqyy33HKLT/dflmPHjun06dO688471atXr3KtM3z4cP3P//yPNmzYoOTkZE/78uXLlZ+fr6FDh2rp0qWVHlPnzp0VEREhSerevbtuvvlmtW7dWu+++265zoQ2adLEa9579+6t+Ph4zZo1S7/61a8qPa6aVPg7DHBmB7VKUlKSDhw4oOzsbE9bRkaGunbtqgEDBigzM1Nnz571Wubv769bb71V0g//w543b55uvPFGNWzYUE2bNtXdd9+tr776yms/xb2N9d1332n06NEKDw9X48aNNXDgQH311Vclvt3w7bff6r777lNYWJiioqL0i1/8Qjk5OZ7lDodD58+f15IlSzxvB5R1yv/YsWO65557FBISorCwMA0fPlzHjx8v0q+4t5Y2btyoxMRENWvWTA0bNlSLFi101113KS8vT4cOHdI111wjSZo2bZpnPKNGjfLa3u7du3X33XeradOmnkBY2ltmK1eu1E9/+lM1aNBA1157rV577TWv5YVnBA4dOuTVnpGRIYfD4Qm2iYmJWrNmjQ4fPuz19snlc3nlz2Dfvn2644471LRpUzVo0EA33nijlixZUux+3n77bT3zzDOKjY1VaGioevfurQMHDhRb05W2bt2qXr16KSQkRMHBwerevbvWrFnjWZ6SkuIJg7/97W/lcDjK9RZp27Zt1b17dy1cuNCrfeHChRo2bJjCwsLKNb7yKtxeYGBgpdZv0qSJ2rZtq8OHD5fab9q0aerWrZvCw8MVGhqqTp06acGCBbr8O6cLX2d5eXlF1r/99tvVrl07z/PyvqYL3+LbvHmzunfvruDgYP3iF7+QVPprA/UDYQe1SuEZmsvP7qSnpyshIUE9evSQw+HQli1bvJZ16tTJcyAfM2aMJkyYoN69e2vVqlWaN2+e9u/fr+7du+vbb78tcb8FBQUaPHiwUlNT9dvf/lYrV65Ut27d1K9fvxLXueuuu9SmTRv9f//f/6cnn3xSqampeuyxxzzLt2/froYNG2rAgAGetwPmzZtX4vYuXLig3r1768MPP9T06dP1zjvvKDo6WsOHDy9z3g4dOqSBAwcqKChICxcu1Nq1azVjxgw1atRIFy9eVExMjNauXSvph39oCsczZcoUr+0MGzZM1113nd555x29/vrrpe5zz549mjBhgh577DGtXLlS3bt3129+8xv98Y9/LHO8V5o3b5569Oih6Ohoz9hKe+vswIED6t69u/bv36/XXntNK1as0A033KBRo0Zp5syZRfo//fTTOnz4sP7nf/5Hf/nLX3Tw4EENHjxY+fn5pY5r06ZNuv3225WTk6MFCxbo7bffVkhIiAYPHuy5tuzhhx/WihUrJEnjx4/X9u3by/0W6ejRo7Vq1SqdOXPGU9e2bds0evTocq1fmvz8fF26dEkXL17UF198obFjx8rpdOruu++u1PbcbrcOHz7sCc0lOXTokMaMGaO//e1vWrFihYYNG6bx48fr97//vafPb37zG505c0apqale6/7rX/9Senq6xo4d62mryGs6Oztb999/v0aMGKEPPvhAjz76aJmvDdQTBqhFTp8+bfz8/MwjjzxijDHm5MmTxuFwmLVr1xpjjLn55pvN5MmTjTHGHDlyxEgyTzzxhDHGmO3btxtJ5uWXX/ba5tdff20aNmzo6WeMMSNHjjTx8fGe52vWrDGSzPz5873WnT59upFkpk6d6mmbOnWqkWRmzpzp1ffRRx81DRo0MAUFBZ62Ro0amZEjR5ar9vnz5xtJ5r333vNq/+Uvf2kkmUWLFhUZQ6F3333XSDJ79uwpcfv/+c9/itRy5faeffbZEpddLj4+3jgcjiL7S05ONqGhoeb8+fPGGGMWLVpkJJmsrCyvfunp6UaSSU9P97QNHDjQ62dyuSvHfe+99xqn02mOHDni1a9///4mODjYfPfdd177GTBggFe/v/3tb0aS2b59e7H7K3TLLbeYyMhIc/bsWU/bpUuXTPv27U3z5s09P+usrCwjybz00kulbu/KvmfPnjWNGzc2c+fONcYY8/jjj5tWrVqZgoICM3bs2CLznpCQYNq1a1fq9gt/Xlc+QkNDzYoVK8ocnzE//HwHDBhg3G63cbvdJisry4wcOdJIMo8//rjXeBISEkrcTn5+vnG73ea5554zzZo183ptJCQkmBtvvNGr/69+9SsTGhrqme+KvKYTEhKMJLNhwwavvuV5bcB+nNlBrdK0aVN17NjRc2Zn06ZN8vf3V48ePSRJCQkJnut0rrxe5/3335fD4dD999+vS5cueR7R0dFe2yzOpk2bJEn33HOPV/t9991X4jpDhgzxev7Tn/5U33//vU6cOFH+gi+Tnp6ukJCQItsdMWJEmeveeOONCgoK0iOPPKIlS5YUOcVfXnfddVe5+7Zr104dO3b0ahsxYoRyc3PLfRFsZW3cuFG9evVSXFycV/uoUaOUl5dX5KxQcT8rSaW+JXP+/Hn94x//0N13363GjRt72v39/fXAAw/om2++KfdbYSVp3Lixfvazn2nhwoW6dOmS3njjDT300EMVvtOuOOvXr9fOnTv18ccf6/3331fv3r117733lvus0wcffKDAwEAFBgaqVatW+tvf/qbx48fr+eefL3W9jRs3qnfv3goLC5O/v78CAwP17LPP6tSpU16vjd/85jfas2eP5w6x3Nxcvfnmmxo5cqRnviv6mm7atKluv/12r7aqem2gbiPsoNZJSkrS559/rmPHjik9PV2dO3f2HPwSEhL0ySefKCcnR+np6QoICFDPnj0l/XANjTFGUVFRnoN04WPHjh2l3ip+6tQpBQQEKDw83Ks9KiqqxHWaNWvm9dzpdEr64e2oyjh16lSx+4uOji5z3datW2v9+vWKjIzU2LFj1bp1a7Vu3VqvvvpqhcYQExNT7r7Fjauw7dSpUxXab0WdOnWq2LHGxsYWu//K/KzOnDkjY0yF9lMZo0eP1u7du/XCCy/oP//5j+c6qqvVsWNHdenSRV27dtXAgQP1zjvv6Lr
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"345.42825pt\" version=\"1.1\" viewBox=\"0 0 411.285625 345.42825\" width=\"411.285625pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-23T14:13:07.869650</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 345.42825 \r\nL 411.285625 345.42825 \r\nL 411.285625 -0 \r\nL 0 -0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 46.965625 307.872 \r\nL 404.085625 307.872 \r\nL 404.085625 41.76 \r\nL 46.965625 41.76 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 63.198352 307.872 \r\nL 84.841989 307.872 \r\nL 84.841989 297.474462 \r\nL 63.198352 297.474462 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 84.841989 307.872 \r\nL 106.485625 307.872 \r\nL 106.485625 268.881231 \r\nL 84.841989 268.881231 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 106.485625 307.872 \r\nL 128.129261 307.872 \r\nL 128.129261 212.994462 \r\nL 106.485625 212.994462 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 128.129261 307.872 \r\nL 149.772898 307.872 \r\nL 149.772898 114.217846 \r\nL 128.129261 114.217846 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 149.772898 307.872 \r\nL 171.416534 307.872 \r\nL 171.416534 77.826462 \r\nL 149.772898 77.826462 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 171.416534 307.872 \r\nL 193.06017 307.872 \r\nL 193.06017 54.432 \r\nL 171.416534 54.432 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 193.06017 307.872 \r\nL 214.703807 307.872 \r\nL 214.703807 110.318769 \r\nL 193.06017 110.318769 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 214.703807 307.872 \r\nL 236.347443 307.872 \r\nL 236.347443 176.603077 \r\nL 214.703807 176.603077 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 236.347443 307.872 \r\nL 257.99108 307.872 \r\nL 257.99108 211.694769 \r\nL 236.347443 211.694769 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 257.99108 307.872 \r\nL 279.634716 307.872 \r\nL 279.634716 255.884308 \r\nL 257.99108 255.884308 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_13\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 279.634716 307.872 \r\nL 301.278352 307.872 \r\nL 301.278352 277.979077 \r\nL 279.634716 277.979077 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_14\">\r\n <path clip-path=\"url(#pd55f7622bd)\" d=\"M 301.278352 307.872 \r\nL 322.921989 307.872 \r\nL 322.921989 298.774154 \r\nL 301.278352 298.774154 \r\nz\r\n\" sty
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"df['Weight'].hist(bins=15)\n",
"plt.suptitle('Weight distribution of MLB Players')\n",
"plt.xlabel('Weight')\n",
"plt.ylabel('Count')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Normal Distribution\n",
"\n",
"Let's create an artificial sample of weights that follows normal distribution with the same mean and variance as real data:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 60,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([187.05660174, 181.77292853, 183.09148457, 198.30703945,\n",
" 201.51640234, 213.21564624, 221.00562653, 218.30263433,\n",
" 234.16968198, 187.40138853, 199.34286071, 205.52705493,\n",
" 251.03651986, 189.64156046, 222.23536452, 211.37502445,\n",
" 205.07287496, 207.90248813, 180.66579133, 226.86092236])"
]
},
3 years ago
"execution_count": 60,
"metadata": {},
3 years ago
"output_type": "execute_result"
}
],
3 years ago
"source": [
"generated = np.random.normal(mean,std,1000)\n",
"generated[:20]"
]
},
{
"cell_type": "code",
"execution_count": 54,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAioAAAGdCAYAAAA8F1jjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAlzElEQVR4nO3dfXBU133/8c9aD4tgpDUSwy5bBMhT2caI4ARTWoUYKCCs8mCbSbGDg8nUf+BicMWjpRIa2TORME6EUmswQ4YBakpIZwqEltZGTLAwozoBgWxDPWAnMghbOztt1V0J5JUsnd8f/nGTtUQs0RV7tPt+zdwZ9txz7373O3D14exdrcsYYwQAAGChu+JdAAAAwK0QVAAAgLUIKgAAwFoEFQAAYC2CCgAAsBZBBQAAWIugAgAArEVQAQAA1kqNdwG3o6enR59++qkyMzPlcrniXQ4AAOgHY4za2trk9/t11139WysZkkHl008/VW5ubrzLAAAAt6G5uVljx47t19whGVQyMzMlffFCs7Ky4lwNAADoj3A4rNzcXOfneH8MyaBy8+2erKwsggoAAEPMQG7b4GZaAABgLYIKAACwFkEFAABYi6ACAACsRVABAADWIqgAAABrEVQAAIC1CCoAAMBaBBUAAGAtggoAALAWQQUAAFiLoAIAAKxFUAEAANYiqAAAAGulxrsAAIlpQumxuDzvx1sXxOV5AQwOVlQAAIC1CCoAAMBaBBUAAGAtggoAALAWQQUAAFiLoAIAAKxFUAEAANYiqAAAAGvxC9+ABBevX7wGALHAigoAALAWQQUAAFiLoAIAAKw14HtUTp06pVdeeUUNDQ1qaWnR4cOH9dhjj0XN+eCDD/TCCy+orq5OPT09mjRpkv7pn/5J48aNkyRFIhFt2LBBP/vZz9TR0aE5c+Zox44dGjt2bExeFGAj7hUBgIEb8IrK9evXNWXKFNXU1PS5/ze/+Y1mzJih+++/X2+99ZbeffddbdmyRcOGDXPmlJSU6PDhwzp48KBOnz6t9vZ2LVy4UN3d3bf/SgAAQMIZ8IpKcXGxiouLb7l/8+bN+ou/+Att27bNGbvnnnucP4dCIe3evVuvv/665s6dK0nav3+/cnNzdeLECc2fP3+gJQEAgAQV03tUenp6dOzYMd17772aP3++Ro8erenTp+vIkSPOnIaGBnV1damoqMgZ8/v9KigoUH19fZ/njUQiCofDURsAAEh8MQ0qwWBQ7e3t2rp1qx555BEdP35cjz/+uJYsWaK6ujpJUiAQUHp6ukaOHBl1rNfrVSAQ6PO8lZWV8ng8zpabmxvLsgEAgKVivqIiSY8++qjWrl2rBx98UKWlpVq4cKF27tz5B481xsjlcvW5r6ysTKFQyNmam5tjWTYAALBUTIPKqFGjlJqaqgceeCBqfOLEibp69aokyefzqbOzU62trVFzgsGgvF5vn+d1u93KysqK2gAAQOKLaVBJT0/XtGnTdOnSpajxy5cva/z48ZKkqVOnKi0tTbW1tc7+lpYWXbhwQYWFhbEsBwAADHED/tRPe3u7PvroI+dxU1OTGhsblZ2drXHjxmnjxo164okn9PDDD2v27Nl644039C//8i966623JEkej0fPPPOM1q9fr5ycHGVnZ2vDhg2aPHmy8ykgAAAA6TaCytmzZzV79mzn8bp16yRJK1as0N69e/X4449r586dqqys1PPPP6/77rtP//zP/6wZM2Y4x2zfvl2pqalaunSp8wvf9u7dq5SUlBi8JAAAkChcxhgT7yIGKhwOy+PxKBQKcb8Khgx+M+2d8fHWBfEuAcAt3M7Pb77rBwAAWIugAgAArEVQAQAA1iKoAAAAaxFUAACAtQgqAADAWgQVAABgLYIKAACwFkEFAABYi6ACAACsRVABAADWIqgAAABrEVQAAIC1CCoAAMBaBBUAAGAtggoAALAWQQUAAFiLoAIAAKxFUAEAANYiqAAAAGsRVAAAgLUIKgAAwFoEFQAAYC2CCgAAsBZBBQAAWIugAgAArEVQAQAA1iKoAAAAaxFUAACAtQgqAADAWgQVAABgrQEHlVOnTmnRokXy+/1yuVw6cuTILeeuXLlSLpdL1dXVUeORSERr1qzRqFGjNGLECC1evFjXrl0baCkAACDBDTioXL9+XVOmTFFNTc0fnHfkyBH96le/kt/v77WvpKREhw8f1sGDB3X69Gm1t7dr4cKF6u7uHmg5AAAggaUO9IDi4mIVFxf/wTmffPKJVq9erTfffFMLFiyI2hcKhbR79269/vrrmjt3riRp//79ys3N1YkTJzR//vyBlgQAABJUzO9R6enp0fLly7Vx40ZNmjSp1/6GhgZ1dXWpqKjIGfP7/SooKFB9fX2f54xEIgqHw1EbAABIfDEPKi+//LJSU1P1/PPP97k/EAgoPT1dI0eOjBr3er0KBAJ9HlNZWSmPx+Nsubm5sS4bAABYKKZBpaGhQT/5yU+0d+9euVyuAR1rjLnlMWVlZQqFQs7W3Nwci3IBAIDlYhpU3n77bQWDQY0bN06pqalKTU3VlStXtH79ek2YMEGS5PP51NnZqdbW1qhjg8GgvF5vn+d1u93KysqK2gAAQOKLaVBZvny53nvvPTU2Njqb3+/Xxo0b9eabb0qSpk6dqrS0NNXW1jrHtbS06MKFCyosLIxlOQAAYIgb8Kd+2tvb9dFHHzmPm5qa1NjYqOzsbI0bN045OTlR89PS0uTz+XTfffdJkjwej5555hmtX79eOTk5ys7O1oYNGzR58mTnU0AAAADSbQSVs2fPavbs2c7jdevWSZJWrFihvXv39usc27dvV2pqqpYuXaqOjg7NmTNHe/fuVUpKykDLAQAACcxljDHxLmKgwuGwPB6PQqEQ96tgyJhQeizeJSSFj7cu+OpJAOLidn5+810/AADAWgQVAABgLYIKAACwFkEFAABYi6ACAACsRVABAADWIqgAAABrEVQAAIC1CCoAAMBaBBUAAGAtggoAALAWQQUAAFiLoAIAAKxFUAEAANYiqAAAAGsRVAAAgLUIKgAAwFoEFQAAYC2CCgAAsBZBBQAAWIugAgAArEVQAQAA1iKoAAAAaxFUAACAtQgqAADAWgQVAABgLYIKAACwFkEFAABYi6ACAACsRVABAADWIqgAAABrDTionDp1SosWLZLf75fL5dKRI0ecfV1dXXrhhRc0efJkjRgxQn6/X08//bQ+/fTTqHNEIhGtWbNGo0aN0ogRI7R48WJdu3bt//xiAABAYhlwULl+/bqmTJmimpqaXvtu3Lihc+fOacuWLTp37pwOHTqky5cva/HixVHzSkpKdPjwYR08eFCnT59We3u7Fi5cqO7u7tt/JQAAIOGkDvSA4uJiFRcX97nP4/GotrY2auzVV1/Vn/zJn+jq1asaN26cQqGQdu/erddff11z586VJO3fv1+5ubk6ceKE5s+ffxsvAwAAJKJBv0clFArJ5XLp7rvvliQ1NDSoq6tLRUVFzhy/36+CggLV19f3eY5IJKJwOBy1AQCAxDeoQeWzzz5TaWmpli1bpqysLElSIBBQenq6Ro4cGTXX6/UqEAj0eZ7Kykp5PB5ny83NHcyyAQCAJQYtqHR1denJJ59UT0+PduzY8ZXzjTFyuVx97isrK1MoFHK25ubmWJcLAAAsNChBpaurS0uXLlVTU5Nqa2ud1RRJ8vl86uzsVGtra9QxwWBQXq+3z/O53W5lZWVFbQAAIPHFPKjcDCkffvihTpw4oZycnKj9U6dOVVpaWtRNty0tLbpw4YIKCwtjXQ4AABjCBvypn/b2dn300UfO46amJjU2Nio7O1t+v1/f/va3de7cOf3rv/6ruru7nftOsrOzlZ6eLo/Ho2eeeUbr169XTk6OsrOztWHDBk2ePNn5FBAAAIB0G0Hl7Nmzmj17tvN43bp1kqQVK1aovLxcR48elSQ9+OCDUcedPHlSs2bNkiRt375dqampWrp0qTo6OjRnzhzt3btXKSkpt/kyAABAInIZY0y8ixiocDgsj8ejUCjE/SoYMiaUHot3CUnh460L4l0CgFu4nZ/ffNc
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 399.10599 297.190125\" width=\"399.10599pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-16T15:55:37.178333</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 399.10599 297.190125 \r\nL 399.10599 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 33.2875 273.312 \r\nL 390.4075 273.312 \r\nL 390.4075 7.2 \r\nL 33.2875 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 49.520227 273.312 \r\nL 71.163864 273.312 \r\nL 71.163864 262.752 \r\nL 49.520227 262.752 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 71.163864 273.312 \r\nL 92.8075 273.312 \r\nL 92.8075 247.666286 \r\nL 71.163864 247.666286 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 92.8075 273.312 \r\nL 114.451136 273.312 \r\nL 114.451136 215.986286 \r\nL 92.8075 215.986286 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 114.451136 273.312 \r\nL 136.094773 273.312 \r\nL 136.094773 160.169143 \r\nL 114.451136 160.169143 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 136.094773 273.312 \r\nL 157.738409 273.312 \r\nL 157.738409 117.929143 \r\nL 136.094773 117.929143 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 157.738409 273.312 \r\nL 179.382045 273.312 \r\nL 179.382045 28.923429 \r\nL 157.738409 28.923429 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 179.382045 273.312 \r\nL 201.025682 273.312 \r\nL 201.025682 25.906286 \r\nL 179.382045 25.906286 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 201.025682 273.312 \r\nL 222.669318 273.312 \r\nL 222.669318 19.872 \r\nL 201.025682 19.872 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 222.669318 273.312 \r\nL 244.312955 273.312 \r\nL 244.312955 114.912 \r\nL 222.669318 114.912 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 244.312955 273.312 \r\nL 265.956591 273.312 \r\nL 265.956591 151.117714 \r\nL 244.312955 151.117714 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_13\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 265.956591 273.312 \r\nL 287.600227 273.312 \r\nL 287.600227 199.392 \r\nL 265.956591 199.392 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_14\">\r\n <path clip-path=\"url(#pad9db13cb8)\" d=\"M 287.600227 273.312 \r\nL 309.243864 273.312 \r\nL 309.243864 243.140571 \r\nL 287.600227 243.140571 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.hist(generated,bins=15)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 62,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAigAAAGdCAYAAAA44ojeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAhBUlEQVR4nO3df3BU1f3/8deakBBisuWH7HaHIKmNrRrwR3BSqJVYQigFEWmLFnWkQztSMON+gEGQP0ydmlBagRYGKg4jFMQ4to22g1qCrVHK0IEMGQm2jlaQIFkRm9lNMLPBeL9/9JvbbH6RTTbcs7vPx8ydce89m7yvy9689n3PveuyLMsSAACAQa5wugAAAICuCCgAAMA4BBQAAGAcAgoAADAOAQUAABiHgAIAAIxDQAEAAMYhoAAAAOOkOl3AQHzxxRc6e/assrKy5HK5nC4HAAD0g2VZam5uls/n0xVX9N0jicuAcvbsWeXk5DhdBgAAGICGhgaNGzeuzzFxGVCysrIk/XcHs7OzHa4GAAD0RygUUk5Ojv13vC9xGVA6TutkZ2cTUAAAiDP9mZ7BJFkAAGAcAgoAADAOAQUAABiHgAIAAIxDQAEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQADhuwup9TpcAwDAEFAAAYBwCCoAhRXcEwEAQUADEzITV+wgkAGKCgAIAAIxDQAFgNLoyQHIioAAAAOMQUAAAgHEIKACMxakdIHkRUAAAgHEIKAAAwDgEFAAAYBwCCoB+4XJfAJcTAQUAABiHgAIAAIxDQAEAAMYhoAAAAOMQUAAAgHEIKABssbpKh6t9AAwWAQVATBBKAMRSqtMFAEBXhB0AdFAARGUg4YGbvAGIFh0UAFHrCBun1s2Oyc8BgK7ooAAYsKEIGIQWABIBBUAfhuLUDKd7APQHAQUAABiHOSgALpu+Oid0VQB0RkABMCi9BQsCB4DB4BQPAEcQYAD0hYACAACMQ0ABAADGIaAAuCROxwC43AgoAADAOAQUAABgHAIKAAAwDvdBAZJU5y/8Y44JANPQQQEAAMYhoACIC3R5gORCQAEAAMYhoAAAAONEFVDKysrkcrkiFq/Xa2+3LEtlZWXy+XzKyMhQUVGRTpw4EfEzwuGwSktLNWbMGGVmZmru3Lk6c+ZMbPYGAAAkhKg7KDfccIMaGxvt5fjx4/a29evXa8OGDdqyZYuOHDkir9erGTNmqLm52R7j9/tVVVWlyspKHTx4UC0tLZozZ47a29tjs0cAks6E1fuYowIkmKgvM05NTY3omnSwLEubNm3S2rVrNX/+fEnSrl275PF4tHfvXj300EMKBoPasWOHdu/ereLiYknSnj17lJOTowMHDmjmzJmD3B0Ag9X58mPTmFwbgNiKuoPy3nvvyefzKTc3V/fee68++OADSdLJkycVCARUUlJij01PT9e0adN06NAhSVJtba0uXrwYMcbn8yk/P98e05NwOKxQKBSxAACAxBVVQCksLNTvfvc7/eUvf9EzzzyjQCCgqVOn6tNPP1UgEJAkeTyeiOd4PB57WyAQUFpamkaOHNnrmJ5UVFTI7XbbS05OTjRlA0mHUx4A4l1UAWXWrFn63ve+p4kTJ6q4uFj79v33ALhr1y57jMvliniOZVnd1nV1qTFr1qxRMBi0l4aGhmjKBgAAcWZQlxlnZmZq4sSJeu+99+x5KV07IefOnbO7Kl6vV21tbWpqaup1TE/S09OVnZ0dsQAAgMQ1qIASDof1z3/+U1/+8peVm5srr9er6upqe3tbW5tqamo0depUSVJBQYGGDRsWMaaxsVH19fX2GAAAgKgCysqVK1VTU6OTJ0/qH//4h77//e8rFArpwQcflMvlkt/vV3l5uaqqqlRfX69FixZpxIgRWrhwoSTJ7XZr8eLFWrFihV5//XUdO3ZM999/v33KCMDlF49zVeKxZgDRieoy4zNnzuiHP/yhzp8/r6uuukrf+MY3dPjwYV199dWSpFWrVqm1tVVLly5VU1OTCgsLtX//fmVlZdk/Y+PGjUpNTdWCBQvU2tqq6dOna+fOnUpJSYntngEAgLgVVUCprKzsc7vL5VJZWZnKysp6HTN8+HBt3rxZmzdvjuZXAwCAJMJ38QAAAOMQUAAAgHEIKAAAwDgEFCCBcbULgHhFQAEAAMYhoAAAAOMQUAAAgHEIKECC45uNAcQjAgqAHhFqADiJgAIkEbopAOIFAQVIQokcUghhQGIgoAAAAOMQUAAAgHEIKAAAwDgEFABxjfkmQGIioABIWEyYBeIXAQVIEvyhBhBPUp0uAAAGgsAFJDY6KAAAwDgEFAAAYBwCCpBgOPUBIBEQUAAAgHEIKAASEp0kIL4RUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAQAAxiGgAAAA4xBQACQMbs4GJA4CCgAAMA4BBUBSmLB6Hx0WII4QUIAEwB9eAImGgAIg4RHggPiT6nQBAAaOP7wAEhUdFAAAYBwCCgAAMA6neIAEwekeAImEDgoQZ7hcFkAyIKAAcYBQAiDZEFAAAIBxCCgAAMA4BBQASYdTZoD5CCgAkhYhBTAXAQVAUiGUAPGBgAIAAIxDQAEAAMYhoAAAAOMQUIA4xVwKAIlsUAGloqJCLpdLfr/fXmdZlsrKyuTz+ZSRkaGioiKdOHEi4nnhcFilpaUaM2aMMjMzNXfuXJ05c2YwpQAAgAQy4IBy5MgRbd++XZMmTYpYv379em3YsEFbtmzRkSNH5PV6NWPGDDU3N9tj/H6/qqqqVFlZqYMHD6qlpUVz5sxRe3v7wPcEAAAkjAEFlJaWFt1333165plnNHLkSHu9ZVnatGmT1q5dq/nz5ys/P1+7du3SZ599pr1790qSgsGgduzYoaeeekrFxcW6+eabtWfPHh0/flwHDhyIzV4BQD9x0zbATAMKKMuWLdPs2bNVXFwcsf7kyZMKBAIqKSmx16Wnp2vatGk6dOiQJKm2tlYXL16MGOPz+ZSfn2+P6SocDisUCkUsAAAgcaVG+4TKykrV1tbq6NGj3bYFAgFJksfjiVjv8Xj04Ycf2mPS0tIiOi8dYzqe31VFRYV+9rOfRVsqAACIU1F1UBoaGvTII4/oueee0/Dhw3sd53K5Ih5bltVtXVd9jVmzZo2CwaC9NDQ0RFM2AACIM1EFlNraWp07d04FBQVKTU1Vamqqampq9Jvf/Eapqal256RrJ+TcuXP2Nq/Xq7a2NjU1NfU6pqv09HRlZ2dHLAAAIHFFFVCmT5+u48ePq66uzl4mT56s++67T3V1dfrKV74ir9er6upq+zltbW2qqanR1KlTJUkFBQUaNmxYxJjGxkbV19fbYwAAQHKLag5KVlaW8vPzI9ZlZmZq9OjR9nq/36/y8nLl5eUpLy9P5eXlGjFihBYuXChJcrvdWrx4sVasWKHRo0dr1KhRWrlypSZOnNht0i0AmKDjKp9T62Y7XAmQPKKeJHspq1atUmtrq5YuXaqmpiYVFhZq//79ysrKssds3LhRqampWrBggVpbWzV9+nTt3LlTKSkpsS4HAADEoUEHlDfeeCPiscvlUllZmcrKynp9zvDhw7V582Zt3rx5sL8eAAAkIL6LBwAAGCfmp3gAIB51vpts57km3GUWcAYdFAAAYBwCCmConr4jhk/zAJIFAQUwHKEEQDIioAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBQAAGIeAAgAAjENAAYAuuPcM4DwCCgAAMA4BBQAAGIeAAgAAjJPqdAEAYCLmoQDOooMCGKSnbzAGgGREQAEAAMYhoAAAAOMQUAADcZoHQLIjoAAOY94JAHRHQAEAAMYhoAAAAOMQUAAAgHEIKAAAwDgEFAAAYBwCCgAAMA4BBTAElxoDwP8QUAAAgHEIKMBlRqcEAC6NgAIAAIxDQAGAfqL7BVw+BBQAAGAcAgoAADAOAQUAABiHgAI4iDkNANAzAgoAADAOAQUAABiHgAIAAIxDQAEAAMYhoAA
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 397.6075 297.190125\" width=\"397.6075pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-16T15:57:22.497553</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 397.6075 297.190125 \r\nL 397.6075 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 33.2875 273.312 \r\nL 390.4075 273.312 \r\nL 390.4075 7.2 \r\nL 33.2875 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 49.520227 273.312 \r\nL 50.602409 273.312 \r\nL 50.602409 272.424294 \r\nL 49.520227 272.424294 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 50.602409 273.312 \r\nL 51.684591 273.312 \r\nL 51.684591 273.312 \r\nL 50.602409 273.312 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 51.684591 273.312 \r\nL 52.766773 273.312 \r\nL 52.766773 272.868147 \r\nL 51.684591 272.868147 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 52.766773 273.312 \r\nL 53.848955 273.312 \r\nL 53.848955 272.868147 \r\nL 52.766773 272.868147 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 53.848955 273.312 \r\nL 54.931136 273.312 \r\nL 54.931136 273.312 \r\nL 53.848955 273.312 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 54.931136 273.312 \r\nL 56.013318 273.312 \r\nL 56.013318 272.424294 \r\nL 54.931136 272.424294 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 56.013318 273.312 \r\nL 57.0955 273.312 \r\nL 57.0955 273.312 \r\nL 56.013318 273.312 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 57.0955 273.312 \r\nL 58.177682 273.312 \r\nL 58.177682 272.868147 \r\nL 57.0955 272.868147 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 58.177682 273.312 \r\nL 59.259864 273.312 \r\nL 59.259864 272.868147 \r\nL 58.177682 272.868147 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 59.259864 273.312 \r\nL 60.342045 273.312 \r\nL 60.342045 271.536588 \r\nL 59.259864 271.536588 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_13\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 60.342045 273.312 \r\nL 61.424227 273.312 \r\nL 61.424227 272.868147 \r\nL 60.342045 272.868147 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_14\">\r\n <path clip-path=\"url(#p18baee1b95)\" d=\"M 61.424227 273.312 \r\nL 62.506409 273.312 \r\nL 62.506409 272.424294 \r\nL 61.424227 272.424294 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_15\">\r\n <
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.hist(np.random.normal(0,1,50000),bins=300)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"Since most values in real life are normally distributed, it means we should not use uniform random number generator to generate sample data. Here is what happens if we try to generate weights with uniform distribution (generated by `np.random.rand`):"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 63,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAigAAAGdCAYAAAA44ojeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAf2klEQVR4nO3df2yV5f3/8dexwKGQ9kiLnMNJC9Skmz+KqOjIKpM6ocj4IWEOFeePjS04fmwFHdKhs5LZYqe10wYMZgGm63DJANlwjDK7MtIZSylTmAGcFar0rJmr5xSop4Ve3z/8cn88FIXiOT3XKc9HchLPfa5z9zq+Y/r07mmPyxhjBAAAYJFL4r0BAACAMxEoAADAOgQKAACwDoECAACsQ6AAAADrECgAAMA6BAoAALAOgQIAAKzTL94buBBdXV06evSoUlJS5HK54r0dAABwHowxamtrk9/v1yWXfPE1koQMlKNHjyozMzPe2wAAABegqalJGRkZX7gmIQMlJSVF0qcvMDU1Nc67AQAA5yMUCikzM9P5Pv5FEjJQTv9YJzU1lUABACDBnM/bM3iTLAAAsA6BAgAArEOgAAAA6xAoAADAOgQKAACwDoECAACsQ6AAAADrECgAAMA6BAoAALAOgQIAAKzT40DZuXOnpk+fLr/fL5fLpc2bNzuPdXZ26pFHHtHo0aM1ePBg+f1+3XfffTp69GjEOcLhsBYtWqShQ4dq8ODBmjFjhj744IMv/WIAAEDf0ONAOX78uMaMGaOKiopuj504cUJ79uzRY489pj179mjjxo06ePCgZsyYEbGuoKBAmzZt0oYNG7Rr1y4dO3ZM06ZN06lTpy78lQAAgD7DZYwxF/xkl0ubNm3SzJkzP3dNXV2dvva1r+nw4cMaMWKEgsGgLrvsMr300ku68847JUlHjx5VZmamXnvtNU2ePPmcXzcUCsnj8SgYDPJhgQAAJIiefP+O+XtQgsGgXC6XLr30UklSfX29Ojs7lZ+f76zx+/3KyclRbW3tWc8RDocVCoUibgAAoO/qF8uTf/LJJ1q2bJnmzJnjlFIgENCAAQM0ZMiQiLVer1eBQOCs5ykpKdETTzwRy60C52XUsq3x3kKPvb9yary3AAA9FrMrKJ2dnbrrrrvU1dWlVatWnXO9MUYul+usjxUWFioYDDq3pqamaG8XAABYJCaB0tnZqdmzZ6uxsVFVVVURP2fy+Xzq6OhQa2trxHNaWlrk9XrPej63263U1NSIGwAA6LuiHiin4+TQoUPasWOH0tPTIx4fO3as+vfvr6qqKudYc3Oz9u3bp9zc3GhvBwAAJKAevwfl2LFjevfdd537jY2N2rt3r9LS0uT3+3XHHXdoz549+tOf/qRTp0457ytJS0vTgAED5PF4NHfuXD300ENKT09XWlqaHn74YY0ePVoTJ06M3isDAAAJq8eBsnv3bt1yyy3O/SVLlkiS7r//fhUVFWnLli2SpGuvvTbiedXV1crLy5MkPfvss+rXr59mz56t9vZ23XrrrVq3bp2SkpIu8GUAAIC+5Ev9HZR44e+gIF74LR4AuHBW/R0UAACAniJQAACAdQgUAABgHQIFAABYh0ABAADWIVAAAIB1CBQAAGAdAgUAAFiHQAEAANYhUAAAgHUIFAAAYB0CBQAAWIdAAQAA1iFQAACAdfrFewMA0BeMWrY13lu4IO+vnBrvLQBnxRUUAABgHQIFAABYh0ABAADWIVAAAIB1CBQAAGAdAgUAAFiHQAEAANYhUAAAgHUIFAAAYB0CBQAAWIdAAQAA1iFQAACAdQgUAABgHQIFAABYh0ABAADWIVAAAIB1CBQAAGAdAgUAAFiHQAEAANYhUAAAgHUIFAAAYB0CBQAAWKdfvDeA6Bi1bGu8t9Bj76+cGu8tAAAsxRUUAABgHQIFAABYh0ABAADWIVAAAIB1CBQAAGAdAgUAAFiHQAEAANYhUAAAgHX4Q21AH8cf8QOQiLiCAgAArNPjQNm5c6emT58uv98vl8ulzZs3RzxujFFRUZH8fr+Sk5OVl5en/fv3R6wJh8NatGiRhg4dqsGDB2vGjBn64IMPvtQLAQAAfUePf8Rz/PhxjRkzRt/73vf07W9/u9vjpaWlKisr07p16/SVr3xFv/jFLzRp0iQdOHBAKSkpkqSCggL98Y9/1IYNG5Senq6HHnpI06ZNU319vZKSkr78qwKQ0BLxx1IAoqvHgTJlyhRNmTLlrI8ZY1ReXq7ly5dr1qxZkqT169fL6/WqsrJS8+bNUzAY1K9//Wu99NJLmjhxoiTp5ZdfVmZmpnbs2KHJkyd/iZcDAAD6gqi+B6WxsVGBQED5+fnOMbfbrQkTJqi2tlaSVF9fr87Ozog1fr9fOTk5zpozhcNhhUKhiBsAAOi7ohoogUBAkuT1eiOOe71e57FAIKABAwZoyJAhn7vmTCUlJfJ4PM4tMzMzmtsGAACWiclv8bhcroj7xphux870RWsKCwsVDAadW1NTU9T2CgAA7BPVQPH5fJLU7UpIS0uLc1XF5/Opo6NDra2tn7vmTG63W6mpqRE3AADQd0U1ULKysuTz+VRVVeUc6+joUE1NjXJzcyVJY8eOVf/+/SPWNDc3a9++fc4aAABwcevxb/EcO3ZM7777rnO/sbFRe/fuVVpamkaMGKGCggIVFxcrOztb2dnZKi4u1qBBgzRnzhxJksfj0dy5c/XQQw8pPT1daWlpevjhhzV69Gjnt3oAAMDFrceBsnv3bt1yyy3O/SVLlkiS7r//fq1bt05Lly5Ve3u75s+fr9bWVo0bN07bt293/gaKJD377LPq16+fZs+erfb2dt16661at24dfwMFAABIklzGGBPvTfRUKBSSx+NRMBjk/Sj/H3/YCsCF4HOP0Jt68v2bz+IBAADWIVAAAIB1CBQAAGAdAgUAAFiHQAEAANYhUAAAgHUIFAAAYB0CBQAAWIdAAQAA1iFQAACAdQgUAABgnR5/WODFgM+1AQAgvriCAgAArEOgAAAA6xAoAADAOgQKAACwDm+SBQAklET8RYb3V06N9xYSDldQAACAdQgUAABgHX7EAwAXsUT8cQkuDlxBAQAA1iFQAACAdQgUAABgHQIFAABYh0ABAADWIVAAAIB1CBQAAGAdAgUAAFiHQAEAANYhUAAAgHUIFAAAYB0CBQAAWIdAAQAA1iFQAACAdQgUAABgHQIFAABYh0ABAADWIVAAAIB1CBQAAGAdAgUAAFinX7w3AABAXzdq2dZ4b6HH3l85Na5fnysoAADAOgQKAACwDoECAACsQ6AAAADrECgAAMA6BAoAALBO1APl5MmTevTRR5WVlaXk5GRdfvnlWrFihbq6upw1xhgVFRXJ7/crOTlZeXl52r9/f7S3AgAAElTUA+Wpp57SCy+8oIqKCr3zzjsqLS3VL3/5Sz3//PPOmtLSUpWVlamiokJ1dXXy+XyaNGmS2traor0dAACQgKIeKP/4xz90++23a+rUqRo1apTuuOMO5efna/fu3ZI+vXpSXl6u5cuXa9asWcrJydH69et14sQJVVZWRns7AAAgAUU9UMaPH6+//vWvOnjwoCTpn//8p3bt2qVvfetbkqTGxkYFAgHl5+c7z3G73ZowYYJqa2ujvR0AAJCAov6n7h955BEFg0FdccUVSkpK0qlTp/Tkk0/q7rvvliQFAgFJktfrjXie1+vV4cOHz3rOcDiscDjs3A+FQtHeNgAAsEjUr6C88sorevnll1VZWak9e/Zo/fr1evrpp7V+/fqIdS6XK+K+MabbsdNKSkrk8XicW2ZmZrS3DQAALBL1QPnpT3+qZcuW6a677tLo0aN17733avHixSopKZEk+Xw+Sf93JeW0lpaWbldVTissLFQwGHRuTU1N0d42AACwSNQD5cSJE7rkksjTJiUlOb9mnJWVJZ/Pp6qqKufxjo4O1dTUKDc396zndLvdSk1NjbgBAIC+K+rvQZk+fbqefPJJjRgxQldffbUaGhpUVlam73//+5I+/dFOQUGBiouLlZ2drezsbBUXF2vQoEGaM2dOtLcDAAASUNQD5fnnn9djjz2
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 397.6075 297.190125\" width=\"397.6075pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-16T16:09:23.514685</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 397.6075 297.190125 \r\nL 397.6075 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 33.2875 273.312 \r\nL 390.4075 273.312 \r\nL 390.4075 7.2 \r\nL 33.2875 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 49.520227 273.312 \r\nL 81.985682 273.312 \r\nL 81.985682 74.784 \r\nL 49.520227 74.784 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 81.985682 273.312 \r\nL 114.451136 273.312 \r\nL 114.451136 51.552 \r\nL 81.985682 51.552 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 114.451136 273.312 \r\nL 146.916591 273.312 \r\nL 146.916591 66.336 \r\nL 114.451136 66.336 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 146.916591 273.312 \r\nL 179.382045 273.312 \r\nL 179.382045 19.872 \r\nL 146.916591 19.872 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 179.382045 273.312 \r\nL 211.8475 273.312 \r\nL 211.8475 57.888 \r\nL 179.382045 57.888 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 211.8475 273.312 \r\nL 244.312955 273.312 \r\nL 244.312955 60 \r\nL 211.8475 60 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 244.312955 273.312 \r\nL 276.778409 273.312 \r\nL 276.778409 34.656 \r\nL 244.312955 34.656 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 276.778409 273.312 \r\nL 309.243864 273.312 \r\nL 309.243864 81.12 \r\nL 276.778409 81.12 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 309.243864 273.312 \r\nL 341.709318 273.312 \r\nL 341.709318 79.008 \r\nL 309.243864 79.008 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#p5995f5c232)\" d=\"M 341.709318 273.312 \r\nL 374.174773 273.312 \r\nL 374.174773 95.904 \r\nL 341.709318 95.904 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"me5d7fb21b1\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"43.595496\" xlink:href=\"#me5d7fb21b1\" y=\"273.312\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- 180 -->\r\n <g transform=\"translate(34.051746 287.910437)scale(0.1 -0.1)\">\r\n
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"wrong_sample = np.random.rand(1000)*2*std+mean-std\n",
"plt.hist(wrong_sample)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Confidence Intervals\n",
"\n",
"Let's now calculate confidence intervals for the weights and heights of baseball players. We will use the code [from this stackoverflow discussion](https://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data):"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 181,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"p=0.85, mean = 201.73±0.94\n",
"p=0.90, mean = 201.73±1.08\n",
"p=0.95, mean = 201.73±1.28\n"
]
}
],
3 years ago
"source": [
"import scipy.stats\n",
"\n",
"def mean_confidence_interval(data, confidence=0.95):\n",
" a = 1.0 * np.array(data)\n",
" n = len(a)\n",
" m, se = np.mean(a), scipy.stats.sem(a)\n",
" h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)\n",
" return m, h\n",
"\n",
"for p in [0.85, 0.9, 0.95]:\n",
" m, h = mean_confidence_interval(df['Weight'].fillna(method='pad'),p)\n",
" print(f\"p={p:.2f}, mean = {m:.2f}±{h:.2f}\")"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Hypothesis Testing\n",
"\n",
"Let's explore different roles in our baseball players dataset:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 175,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Height</th>\n",
" <th>Weight</th>\n",
" <th>Count</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Role</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Catcher</th>\n",
" <td>72.723684</td>\n",
" <td>204.328947</td>\n",
" <td>76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Designated_Hitter</th>\n",
" <td>74.222222</td>\n",
" <td>220.888889</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>First_Baseman</th>\n",
" <td>74.000000</td>\n",
" <td>213.109091</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Outfielder</th>\n",
" <td>73.010309</td>\n",
" <td>199.113402</td>\n",
" <td>194</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Relief_Pitcher</th>\n",
" <td>74.374603</td>\n",
" <td>203.517460</td>\n",
" <td>315</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Second_Baseman</th>\n",
" <td>71.362069</td>\n",
" <td>184.344828</td>\n",
" <td>58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Shortstop</th>\n",
" <td>71.903846</td>\n",
" <td>182.923077</td>\n",
" <td>52</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Starting_Pitcher</th>\n",
" <td>74.719457</td>\n",
" <td>205.163636</td>\n",
" <td>221</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Third_Baseman</th>\n",
" <td>73.044444</td>\n",
" <td>200.955556</td>\n",
" <td>45</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
3 years ago
],
"text/plain": [
" Height Weight Count\n",
"Role \n",
"Catcher 72.723684 204.328947 76\n",
"Designated_Hitter 74.222222 220.888889 18\n",
"First_Baseman 74.000000 213.109091 55\n",
"Outfielder 73.010309 199.113402 194\n",
"Relief_Pitcher 74.374603 203.517460 315\n",
"Second_Baseman 71.362069 184.344828 58\n",
"Shortstop 71.903846 182.923077 52\n",
"Starting_Pitcher 74.719457 205.163636 221\n",
"Third_Baseman 73.044444 200.955556 45"
]
},
3 years ago
"execution_count": 175,
"metadata": {},
3 years ago
"output_type": "execute_result"
}
],
3 years ago
"source": [
"df.groupby('Role').agg({ 'Height' : 'mean', 'Weight' : 'mean', 'Age' : 'count'}).rename(columns={ 'Age' : 'Count'})"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"Let's test the hypothesis that First Basemen are higher then Second Basemen. The simplest way to do it is to test the confidence intervals:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 188,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"Conf=0.85, 1st basemen height: 73.62..74.38, 2nd basemen height: 71.04..71.69\n",
"Conf=0.90, 1st basemen height: 73.56..74.44, 2nd basemen height: 70.99..71.73\n",
"Conf=0.95, 1st basemen height: 73.47..74.53, 2nd basemen height: 70.92..71.81\n"
]
}
],
3 years ago
"source": [
"for p in [0.85,0.9,0.95]:\n",
" m1, h1 = mean_confidence_interval(df.loc[df['Role']=='First_Baseman',['Height']],p)\n",
" m2, h2 = mean_confidence_interval(df.loc[df['Role']=='Second_Baseman',['Height']],p)\n",
" print(f'Conf={p:.2f}, 1st basemen height: {m1-h1[0]:.2f}..{m1+h1[0]:.2f}, 2nd basemen height: {m2-h2[0]:.2f}..{m2+h2[0]:.2f}')"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"We can see that intervals do not overlap.\n",
"\n",
"More statistically correct way to prove the hypothesis is to use **Student t-test**:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 200,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"T-value = 7.65\n",
"P-value: 9.137321189738925e-12\n"
]
}
],
3 years ago
"source": [
"from scipy.stats import ttest_ind\n",
"\n",
"tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Second_Baseman',['Height']],equal_var=False)\n",
"print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"Two values returned by the `ttest_ind` functions are:\n",
"* p-value can be considered as the probability of two distributions having the same mean. In our case, it is very low, meaning that there is strong evidence supporting that first basemen are taller\n",
"* t-value is the intermediate value of normalized mean difference that is used in t-test, and it is compared against threshold value for a given confidence value "
3 years ago
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Simulating Normal Distribution with Central Limit Theorem\n",
"\n",
"Pseudo-random generator in Python is designed to give us uniform distribution. If we want to create a generator for normal distribution, we can use central limit theorem. To get a normally distributed value we will just compute a mean of a uniform-generated sample."
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 64,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiwAAAGdCAYAAAAxCSikAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAm2klEQVR4nO3df3RU5Z3H8c9AYELdZCRqkglECBwEAZdi0EAEBNFgoFRaWrC0gNtuLWt0wRxOTVpZoLsloLbLoSAeLT90aYF2w49sQ9uEI0lAoiuauP2BGGogVDNLcSUDWCcEnv3Dw9QhP8gkc8kz8f06557jvfd5nnzvY7jzOc/czLiMMUYAAAAW69HVBQAAAFwNgQUAAFiPwAIAAKxHYAEAANYjsAAAAOsRWAAAgPUILAAAwHoEFgAAYL2Yri4gUi5duqT3339fcXFxcrlcXV0OAABoB2OMzp49q5SUFPXo0fo6SrcJLO+//75SU1O7ugwAANABJ0+eVP/+/Vs9320CS1xcnKRPLjg+Pr6LqwEAAO3h9/uVmpoafB1vTbcJLJffBoqPjyewAAAQZa72OAcP3QIAAOsRWAAAgPUILAAAwHoEFgAAYD0CCwAAsB6BBQAAWI/AAgAArEdgAQAA1iOwAAAA6xFYAACA9cIKLAUFBbrjjjsUFxenxMREzZw5U0ePHg1pY4zR8uXLlZKSoj59+mjSpEn6wx/+cNWxCwsLNXz4cLndbg0fPly7du0K70oAAEC3FVZgKS8vV05Ojl599VWVlpaqqalJWVlZOn/+fLDNU089pR//+Mdat26dXn/9dSUnJ+u+++7T2bNnWx23srJSc+bM0bx58/TWW29p3rx5mj17tl577bWOXxkAAOg2XMYY09HOf/nLX5SYmKjy8nJNnDhRxhilpKRo8eLFeuKJJyRJgUBASUlJWr16tb7zne+0OM6cOXPk9/v161//Onjs/vvvV9++fbVt27Z21eL3++XxeNTQ0MCXHwIAECXa+/rdqWdYGhoaJEkJCQmSpNraWvl8PmVlZQXbuN1u3X333Tp06FCr41RWVob0kaSpU6e22ScQCMjv94dsAACge4rpaEdjjHJzczV+/HiNHDlSkuTz+SRJSUlJIW2TkpJ04sSJVsfy+Xwt9rk8XksKCgq0YsWKjpYPIMoMzCvu6hLCdnzV9K4uAeg2OrzC8uijj+p//ud/WnzLxuVyhewbY5od62yf/Px8NTQ0BLeTJ0+GUT0AAIgmHVpheeyxx1RUVKSKigr1798/eDw5OVnSJysmXq83ePzUqVPNVlA+LTk5udlqytX6uN1uud3ujpQPAACiTFgrLMYYPfroo9q5c6defvllpaWlhZxPS0tTcnKySktLg8caGxtVXl6uzMzMVscdN25cSB9JKikpabMPAAD47AhrhSUnJ0c///nPtWfPHsXFxQVXRTwej/r06SOXy6XFixdr5cqVGjJkiIYMGaKVK1fqc5/7nObOnRscZ/78+erXr58KCgokSYsWLdLEiRO1evVqPfDAA9qzZ4/27dungwcPRvBSAQBAtAorsGzYsEGSNGnSpJDjmzdv1kMPPSRJ+u53v6u//vWveuSRR/Thhx8qIyNDJSUliouLC7avq6tTjx5/W9zJzMzU9u3b9eSTT2rp0qUaPHiwduzYoYyMjA5eFgAA6E469TksNuFzWIDujb8SArqna/I5LAAAANcCgQUAAFiPwAIAAKxHYAEAANYjsAAAAOsRWAAAgPUILAAAwHoEFgAAYD0CCwAAsB6BBQAAWI/AAgAArEdgAQAA1iOwAAAA6xFYAACA9QgsAADAegQWAABgPQILAACwHoEFAABYj8ACAACsR2ABAADWI7AAAADrEVgAAID1CCwAAMB6BBYAAGA9AgsAALAegQUAAFiPwAIAAKxHYAEAANYjsAAAAOsRWAAAgPUILAAAwHoEFgAAYD0CCwAAsF7YgaWiokIzZsxQSkqKXC6Xdu/eHXLe5XK1uD399NOtjrlly5YW+3z88cdhXxAAAOh+wg4s58+f16hRo7Ru3boWz9fX14dsmzZtksvl0qxZs9ocNz4+vlnf2NjYcMsDAADdUEy4HbKzs5Wdnd3q+eTk5JD9PXv2aPLkyRo0aFCb47pcrmZ9AQAAJIefYfnf//1fFRcX61vf+tZV2547d04DBgxQ//799YUvfEFVVVVttg8EAvL7/SEbAADonhwNLC+++KLi4uL05S9/uc12w4YN05YtW1RUVKRt27YpNjZWd911l2pqalrtU1BQII/HE9xSU1MjXT4AALCEo4Fl06ZN+vrXv37VZ1HGjh2rb3zjGxo1apQmTJigX/ziF7rlllv0k5/8pNU++fn5amhoCG4nT56MdPkAAMASYT/D0l4HDhzQ0aNHtWPHjrD79ujRQ3fccUebKyxut1tut7szJQIAgCjh2ArLxo0blZ6erlGjRoXd1xij6upqeb1eByoDAADRJuwVlnPnzunYsWPB/draWlVXVyshIUE333yzJMnv9+uXv/ylfvSjH7U4xvz589WvXz8VFBRIklasWKGxY8dqyJAh8vv9Wrt2raqrq7V+/fqOXBMAAOhmwg4shw8f1uTJk4P7ubm5kqQFCxZoy5YtkqTt27fLGKOvfe1rLY5RV1enHj3+trhz5swZPfzww/L5fPJ4PBo9erQqKip05513hlseAADohlzGGNPVRUSC3++Xx+NRQ0OD4uPju7ocABE2MK+4q0sI2/FV07u6BMB67X395ruEAACA9QgsAADAegQWAABgPQILAACwHoEFAABYj8ACAACsR2ABAADWI7AAAADrEVgAAID1CCwAAMB6BBYAAGA9AgsAALAegQUAAFiPwAIAAKxHYAEAANYjsAAAAOsRWAAAgPUILAAAwHoEFgAAYD0CCwAAsB6BBQAAWI/AAgAArEdgAQAA1iOwAAAA6xFYAACA9QgsAADAegQWAABgPQILAACwHoEFAABYj8ACAACsR2ABAADWI7AAAADrEVgAAID1wg4sFRUVmjFjhlJSUuRyubR79+6Q8w899JBcLlfINnbs2KuOW1hYqOHDh8vtdmv48OHatWtXuKUBAIBuKuzAcv78eY0aNUrr1q1rtc3999+v+vr64LZ37942x6ysrNScOXM0b948vfXWW5o3b55mz56t1157LdzyAABANxQTbofs7GxlZ2e32cbtdis5ObndY65Zs0b33Xef8vPzJUn5+fkqLy/XmjVrtG3btnBLBAAA3Ywjz7CUlZUpMTFRt9xyi7797W/r1KlTbbavrKxUVlZWyLGpU6fq0KFDrfYJBALy+/0hGwAA6J4iHliys7P1s5/9TC+//LJ+9KMf6fXXX9c999yjQCDQah+fz6ekpKSQY0lJSfL5fK32KSgokMfjCW6pqakRuwYAAGCXsN8Supo5c+YE/3vkyJEaM2aMBgwYoOLiYn35y19utZ/L5QrZN8Y0O/Zp+fn5ys3NDe77/X5CCwAA3VTEA8uVvF6vBgwYoJqamlbbJCcnN1tNOXXqVLNVl09zu91yu90RqxMAANjL8c9h+eCDD3Ty5El5vd5W24wbN06lpaUhx0pKSpSZmel0eQAAIAqEvcJy7tw5HTt2LLhfW1ur6upqJSQkKCEhQcuXL9esWbPk9Xp1/Phxfe9739ONN96oL33pS8E+8+fPV79+/VRQUCBJWrRokSZOnKjVq1frgQce0J49e7Rv3z4dPHgwApcIAACiXdiB5fDhw5o8eXJw//JzJAsWLNCGDRv0u9/9Ti+99JLOnDkjr9eryZMna8eOHYqLiwv2qaurU48ef1vcyczM1Pbt2/Xkk09q6dKlGjx4sHbs2KGMjIzOXBsAAOgmXMYY09VFRILf75fH41FDQ4Pi4+O7uhwAETYwr7irSwjb8VXTu7oEwHrtff3mu4QAAID1CCwAAMB6BBYAAGA9AgsAALAegQUAAFiPwAIAAKxHYAEAANYjsAAAAOsRWAAAgPUILAAAwHoEFgAAYL2wv/wQQCi+4wat4XcDiBxWWAAAgPUILAAAwHoEFgAAYD0CCwAAsB6BBQAAWI/AAgAArEd
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 400.785625 297.190125\" width=\"400.785625pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-16T16:31:32.588708</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 400.785625 297.190125 \r\nL 400.785625 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 36.465625 273.312 \r\nL 393.585625 273.312 \r\nL 393.585625 7.2 \r\nL 36.465625 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 52.698352 273.312 \r\nL 85.163807 273.312 \r\nL 85.163807 222.624 \r\nL 52.698352 222.624 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 85.163807 273.312 \r\nL 117.629261 273.312 \r\nL 117.629261 197.28 \r\nL 85.163807 197.28 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 117.629261 273.312 \r\nL 150.094716 273.312 \r\nL 150.094716 133.92 \r\nL 117.629261 133.92 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 150.094716 273.312 \r\nL 182.56017 273.312 \r\nL 182.56017 95.904 \r\nL 150.094716 95.904 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 182.56017 273.312 \r\nL 215.025625 273.312 \r\nL 215.025625 108.576 \r\nL 182.56017 108.576 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 215.025625 273.312 \r\nL 247.49108 273.312 \r\nL 247.49108 19.872 \r\nL 215.025625 19.872 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 247.49108 273.312 \r\nL 279.956534 273.312 \r\nL 279.956534 95.904 \r\nL 247.49108 95.904 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 279.956534 273.312 \r\nL 312.421989 273.312 \r\nL 312.421989 133.92 \r\nL 279.956534 133.92 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 312.421989 273.312 \r\nL 344.887443 273.312 \r\nL 344.887443 197.28 \r\nL 312.421989 197.28 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#p30ea2c5606)\" d=\"M 344.887443 273.312 \r\nL 377.352898 273.312 \r\nL 377.352898 260.64 \r\nL 344.887443 260.64 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"m12aa72ebce\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"73.172678\" xlink:href=\"#m12aa72ebce\" y=\"273.312\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- 0.44 -->\r\n <g transform=\"translate(62.039865 287.
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"def normal_random(sample_size=100):\n",
" sample = [random.uniform(0,1) for _ in range(sample_size) ]\n",
" return sum(sample)/sample_size\n",
"\n",
"sample = [normal_random() for _ in range(100)]\n",
"plt.hist(sample)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Correlation and Evil Baseball Corp\n",
"\n",
"Correlation allows us to find inner connection between data sequences. In our toy example, let's pretend there is an evil baseball corporation that pays it's players according to their height - the taller the player is, the more money he/she gets. Suppose there is a base salary of $1000, and an additional bonus from $0 to $100, depending on height. We will take the real players from MLB, and compute their imaginary salaries:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 99,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"[(74, 1075.2469071629068), (74, 1075.2469071629068), (72, 1053.7477908306478), (72, 1053.7477908306478), (73, 1064.4973489967772), (69, 1021.4991163322591), (69, 1021.4991163322591), (71, 1042.9982326645181), (76, 1096.746023495166), (71, 1042.9982326645181)]\n"
]
}
],
3 years ago
"source": [
"heights = df['Height']\n",
"salaries = 1000+(heights-heights.min())/(heights.max()-heights.mean())*100\n",
"print(list(zip(heights,salaries))[:10])"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"Let's now compute covariance and correlation of those sequences. `np.cov` will give us so-called **covariance matrix**, which is an extension of covariance to multiple variables. The element $M_{ij}$ of the covariance matrix $M$ is a correlation between input variables $X_i$ and $X_j$, and diagonal values $M_{ii}$ is the variance of $X_{i}$. Similarly, `np.corrcoef` will give us **correlation matrix**."
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 100,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"Covariance matrix:\n",
"[[ 5.31679808 57.15323023]\n",
" [ 57.15323023 614.37197275]]\n",
"Covariance = 57.15323023054467\n",
"Correlation = 1.0\n"
]
}
],
3 years ago
"source": [
"print(f\"Covariance matrix:\\n{np.cov(heights,salaries)}\")\n",
"print(f\"Covariance = {np.cov(heights,salaries)[0,1]}\")\n",
"print(f\"Correlation = {np.corrcoef(heights,salaries)[0,1]}\")"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"Correlation equal to 1 means that there is a strong **linear relation** between two variables. We can visually see the linear relation by plotting one value against the other:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 101,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjEAAAGdCAYAAADjWSL8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAxI0lEQVR4nO3de3hUVZ7v/08RoCCaVBMuKUoTkrZpmotCUJubNjCjgQhkPI7TMGigRyZyDtPKRUVyxIbgEMDTrfZMxob2hiMw8tgCPxQmLd1HiQ4EEChbLqJxKqAmEZtgVQKYoFnnD36ppnIvUknVrrxfz7Ofh73Xqp3vIlj1ce1da9uMMUYAAAAW0yXcBQAAAFwJQgwAALAkQgwAALAkQgwAALAkQgwAALAkQgwAALAkQgwAALAkQgwAALCkruEuoL3U1taqtLRUcXFxstls4S4HAAC0gjFGlZWVcrlc6tKl+bmWqA0xpaWlSkpKCncZAADgCnz22We69tprm+0TtSEmLi5O0qW/hPj4+DBXAwAAWsPn8ykpKcn/Od6cqA0xdZeQ4uPjCTEAAFhMa24F4cZeAABgSYQYAABgSYQYAABgSUGHmMLCQk2bNk0ul0s2m03btm0LaN+yZYsmTZqkPn36yGazye12B7SXlJTIZrM1ur322mv+fikpKQ3alyxZckWDBAAA0SfoEHPu3DkNHz5c+fn5TbaPGzdOq1evbrQ9KSlJZWVlAVtubq6uuuoqZWRkBPRdsWJFQL+lS5cGWy4AAIhSQX87KSMjo0HYuFxWVpakSzMujYmJiZHT6Qw4tnXrVk2fPl1XX311wPG4uLgGfQEAAKQIuCfm4MGDcrvdmjNnToO2NWvWqHfv3hoxYoRWrlypmpqaMFQIAAAiUdjXiXnhhRc0ePBgjR07NuD4/PnzNXLkSPXq1Uv79+9XTk6OPB6Pnn/++UbPU11drerqav++z+dr17oBAEB4hTXEXLhwQZs2bdLjjz/eoG3hwoX+P99www3q1auX7r77bv/sTH2rVq1Sbm5uu9YLAACk72qN9nsqdLryG/WL66EfpyYopkvHP6cwrCHmd7/7nc6fP69Zs2a12Hf06NGSpOLi4kZDTE5OjhYtWuTfr1u2GAAAhE7BkTLlvnFMZd5v/Mf6O3po2bQhmjysf4fWEtZ7Yl544QVlZmaqb9++LfY9fPiwJKl//8b/gux2u/8RAzxqAACA0Cs4Uqb/teFQQICRpHLvN/pfGw6p4EhZh9YT9ExMVVWViouL/fsej0dut1sJCQlKTk5WRUWFTp06pdLSUknSiRMnJElOpzPgm0bFxcUqLCzUzp07G/yMvXv3qqioSBMnTpTD4dCBAwe0cOFCZWZmKjk5OehBAgCAtvmu1ij3jWMyjbQZSTZJuW8c0+1DnB12aSnomZj3339faWlpSktLkyQtWrRIaWlp+sUvfiFJ2r59u9LS0jRlyhRJ0owZM5SWlqa1a9cGnOfFF1/UNddco/T09AY/w263a/PmzZowYYKGDBmiX/ziF8rOztZ//Md/BD1AAADQdvs9FQ1mYC5nJJV5v9F+T0WH1WQzxjQWqizP5/PJ4XDI6/VyaQkAgDb6/9xfaP6r7hb7/XrGCP3NiGuu+OcE8/kd9nViAABA5OsX1yOk/UKBEAMAAFr049QE9Xf0UFN3u9h06VtKP05N6LCaCDEAAKBFMV1sWjZtiCQ1CDJ1+8umDenQ9WIIMQAAoFUmD+uv39w7Uk5H4CUjp6OHfnPvyA5fJybsjx0AAADWMXlYf90+xMmKvQAAwHpiutg05rqGq+d3NC4nAQAASyLEAAAASyLEAAAASyLEAAAASyLEAAAASyLEAAAASyLEAAAASyLEAAAASyLEAAAAS2LFXgAALKzm21q9srdEJyvOa0BCrLLGpKh7184xR0GIAQDAolbtPKbn3vWo1vzl2Mqdx5V9a6py7hgSvsI6CCEGAAALWrXzmNYVehocrzXyH4/2INM55psAAIgiNd/W6rl3GwaYyz33rkc139Z2UEXhQYgBAMBiXtlbEnAJqTG15lK/aEaIAQDAYk5WnA9pP6sixAAAYDEDEmJD2s+qCDEAAFhM1pgUdbE136eL7VK/aEaIAQDAYrp37aLsW1Ob7ZN9a2rUrxfDV6wBALCguq9P118npotNnWadGJsxpoX7m63J5/PJ4XDI6/UqPj4+3OUAANAuom3F3mA+v5mJAQDAwrp37aI5t34/3GWEhXWjGgAA6NQIMQAAwJIIMQAAwJIIMQAAwJIIMQAAwJIIMQAAwJIIMQAAwJIIMQAAwJIIMQAAwJJYsRcAgBCKtscARDJCDAAAIbJq57EGD2RcufN4p3kgY0cLOhoWFhZq2rRpcrlcstls2rZtW0D7li1bNGnSJPXp00c2m01ut7vBOSZMmCCbzRawzZgxI6DP2bNnlZWVJYfDIYfDoaysLH399dfBlgsAQIdYtfOY1hUGBhhJqjXSukKPVu08Fp7ColjQIebcuXMaPny48vPzm2wfN26cVq9e3ex5srOzVVZW5t/WrVsX0D5z5ky53W4VFBSooKBAbrdbWVlZwZYLAEC7q/m2Vs+962m2z3PvelTzbW0HVdQ5BH05KSMjQxkZGU221wWNkpKSZs8TGxsrp9PZaNvx48dVUFCgoqIijRo1SpL03HPPacyYMTpx4oQGDRoUbNkAALSbV/aWNJiBqa/WXOrXWZ843R7CdqfRxo0b1adPHw0dOlQPP/ywKisr/W179+6Vw+HwBxhJGj16tBwOh/bs2dPo+aqrq+Xz+QI2AAA6wsmK8yHth9YJy42999xzj1JTU+V0OnXkyBHl5OTogw8+0K5duyRJ5eXl6tevX4PX9evXT+Xl5Y2ec9WqVcrNzW3XugEAaMyAhNiQ9kPrhCXEZGdn+/88bNgwDRw4UDfddJMOHTqkkSNHSpJsNluD1xljGj0uSTk5OVq0aJF/3+fzKSkpKcSVAwDQUNaYFK3cebzZS0pdbJf6IXQi4ovrI0eOVLdu3fTJJ59IkpxOp7788ssG/b766islJiY2eg673a74+PiADQCAjtC9axdl35rabJ/sW1NZLybEIuJv8+jRo7p48aL69+8vSRozZoy8Xq/279/v77Nv3z55vV6NHTs2XGUCANCknDuGaO5PUtWl3gWDLjZp7k9YJ6Y9BH05qaqqSsXFxf59j8cjt9uthIQEJScnq6KiQqdOnVJpaakk6cSJE5Iuza44nU59+umn2rhxo+644w716dNHx44d00MPPaS0tDSNGzdOkjR48GBNnjxZ2dnZ/q9e33///Zo6dSrfTAIARKycO4boofQfsWJvRzFBevvtt42kBtvs2bONMca89NJLjbYvW7bMGGPMqVOnzE9+8hOTkJBgunfvbq677jrz4IMPmjNnzgT8nDNnzph77rnHxMXFmbi4OHPPPfeYs2fPtrpOr9drJBmv1xvsEAEAQJgE8/ltM8a08M12a/L5fHI4HPJ6vdwfAwCARQTz+c38FgAAsCRCDAAAsCRCDAAAsCRCDAAAsCRCDAAAsCRCDAAAsCRCDAAAsCRCDAAAsKSwPMUaAICO4j1/Ufet369S7zdyOXroxZ/9WI7YbuEuCyFAiAEARK3x/+f/6uSZC/79Mu83Gr7iLQ3o3VO7H/mrMFaGUOByEgAgKtUPMJc7eeaCxv+f/9vBFSHUCDEAgKjjPX+xyQBT5+SZC/Kev9hBFaE9EGIAAFHnvvX7Q9oPkYkQAwCIOqXeb0LaD5GJEAMAiDouR4+Q9kNkIsQAAKLOiz/7cUj7ITIRYgAAUccR200Devdsts+A3j1ZL8biCDEAgKi0+5G/ajLIsE5MdGCxOwBA1Nr9yF+xYm8UI8QAAKKaI7abXp83LtxloB1wOQkAAFgSIQYAAFgSIQYAAFgSIQYAAFgSIQYAAFgSIQYAAFgSIQYAAFgSIQYAAFgSIQYAAFgSK/YCADrUsc99mpr/rmp16f+k3/z5rRpybXy4y4IFEWIAAB0mZcmOgP1aSXfkvytJKlk9JQwVwcq4nAQA6BD1A0yw7UB9hBgAQLs79rkvpP0AiRA
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 403.97 297.190125\" width=\"403.97pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-16T23:50:38.483752</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 403.97 297.190125 \r\nL 403.97 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 39.65 273.312 \r\nL 396.77 273.312 \r\nL 396.77 7.2 \r\nL 39.65 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"PathCollection_1\">\r\n <defs>\r\n <path d=\"M 0 3 \r\nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \r\nC 2.683901 1.55874 3 0.795609 3 0 \r\nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \r\nC 1.55874 -2.683901 0.795609 -3 0 -3 \r\nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \r\nC -2.683901 -1.55874 -3 -0.795609 -3 0 \r\nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \r\nC -1.55874 2.683901 -0.795609 3 0 3 \r\nz\r\n\" id=\"mfa55a1398c\" style=\"stroke:#1f77b4;\"/>\r\n </defs>\r\n <g clip-path=\"url(#p22cbd6f660)\">\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#mfa55a1398c\" y=\"155.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#mfa55a1398c\" y=\"155.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"157.337273\" xlink:href=\"#mfa55a1398c\" y=\"185.616\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"157.337273\" xlink:href=\"#mfa55a1398c\" y=\"185.616\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#mfa55a1398c\" y=\"170.496\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"96.464545\" xlink:href=\"#mfa55a1398c\" y=\"230.976\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"96.464545\" xlink:href=\"#mfa55a1398c\" y=\"230.976\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"137.046364\" xlink:href=\"#mfa55a1398c\" y=\"200.736\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"238.500909\" xlink:href=\"#mfa55a1398c\" y=\"125.136\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"137.046364\" xlink:href=\"#mfa55a1398c\" y=\"200.736\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#mfa55a1398c\" y=\"170.496\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#mfa55a1398c\" y=\"170.496\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#mfa55a1398c\" y=\"155.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#mfa55a1398c\" y=\"155.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"96.464545\" xlink:href=\"#mfa55a1398c\" y=\"230.976\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"116.755455\" xlink:href=\"#mfa55a1398c\" y=\"215.856\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"157.337273\" xlink:href=\"#mfa55a1398c\" y=\"185.616\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#mfa55a1398c\" y=\"170.496\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"218.21\" xlink:href=\"#mfa55a13
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.scatter(heights,salaries)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"Let's see what happens if the relation is not linear. Suppose that our corporation decided to hide the obvious linear dependency between heights and salaries, and introduced some non-linearity into the formula, such as `sin`:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 91,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"Correlation = 0.9835304456670811\n"
]
}
],
3 years ago
"source": [
"salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100\n",
"print(f\"Correlation = {np.corrcoef(heights,salaries)[0,1]}\")"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"In this case, the correlation is slightly smaller, but it is still quite high. Now, to make the relation even less obvious, we might want to add some extra randomness by adding some random variable to the salary. Let's see what happens:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 102,
3 years ago
"metadata": {},
"outputs": [
{
"name": "stdout",
3 years ago
"output_type": "stream",
"text": [
"Correlation = 0.9384710733057905\n"
]
}
],
3 years ago
"source": [
"salaries = 1000+np.sin((heights-heights.min())/(heights.max()-heights.mean()))*100+np.random.random(size=len(heights))*20-10\n",
"print(f\"Correlation = {np.corrcoef(heights,salaries)[0,1]}\")"
]
},
{
"cell_type": "code",
"execution_count": 104,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjEAAAGdCAYAAADjWSL8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAABKgklEQVR4nO3de3xU9Z0//teZmdwThiQkmQwCRo2AaCGiholuwSoEFKlf/VpbMKZbF/HHtinagrLaBWsXXG3BdbOI9dEVtoTVbRF+VLuJsbVSzZAAMhYjNzUqQi5ckgkJuTAzn+8fcUYmt/OZ8Mnc8no+Hnn4mDMvzucckJk353PThBACRERERBHGEOoLICIiIhoKFjFEREQUkVjEEBERUURiEUNEREQRiUUMERERRSQWMURERBSRWMQQERFRRGIRQ0RERBHJFOoLGC4ejwcnTpxASkoKNE0L9eUQERGRBCEEzp49C6vVCoNh8GctUVvEnDhxAuPGjQv1ZRAREdEQHDt2DJdccsmgmagtYlJSUgD0/CaMGjUqxFdDREREMlpbWzFu3Djf9/hgoraI8XYhjRo1ikUMERFRhJEZCsKBvURERBSRWMQQERFRRGIRQ0RERBGJRQwRERFFJBYxREREFJFYxBAREVFEYhFDREREEYlFDBEREUWkqF3sjoiIaCRwu1w4VF2BjubjSEgdi0n5hTCaRsbX+8i4SyIioii0v2IzrPYnMQWnfccaK9NxwrYKeYXFIbyy4GARQ0REFIH2V2zG1KqSnhcXrNCfIU4jo6oE+4GoL2Q4JoaIiCjCuF0uWO1PAgAMvbYY8r7Otj8Jt8sV5CsLLhYxREREEeZQdQWycLpPAeNl0AALTuNQdUVwLyzIWMQQERFFmI7m40pzkYpFDBERUYRJSB2rNBepOLCXiIiiWjROQZ6UX4jGynRkiP67lDwCaNLSMSm/MPgXF0SR/adIREQ0iGidgmw0mXDCtgoZVSXwCP/BvR7R89962ypYIrxY08PuJCIiikreKcgZ4rTf8QxxGlOrSrC/YnOIrkyNvMJifFDwPE5q6X7Hm7R0fFDwfEQXabI0IYQI9UUMh9bWVpjNZjidTowaNSrUl0NERF/p7uzEvm3PQmuug0jNwfS7lyM2Pl5pG26XC6d+caVud0vGE0civmsp2rrLAvn+ZhFDRERBY9+4FDfUb4VR+/qrxy001GQvhO2hDcraqX3vDUypXKifm70VU268XVm7dPEC+f5mdxIREQWFfeNSzKgvgwH+/3Y2QGBGfRnsG5cqa2skTUF2u1yofe8N7H3916h9742oX+DuQpH7vImIiCJGd2cnbqjfCgDQenXvaBogBHB9/X+ju3Odkq6lkTIFOVoHLsvikxgiIhp2+7Y9C6Mm+hQwXpoGmDQP9m17Vkl7k/IL0Yh030yd3jwCaEBkT0GO9oHLMljEEBHRsNOa65Tm9HinIAPoU8hcOAU5UgfAcu+kHixiiIho2InUHKU5GdE8BZl7J/WIzBKUiIgiyvS7l8O99lcwoP8uJSEANwyYfvdype3mFRbDfcsi1Paagjyci8AFY8rzSBq4PBgWMURENOxi4+Nhz16IGfVlEMJ/cK93oY892d+DTfF6MUBP11KwplEHa6DtSBm4rIfdSUREFBS2hzZgd/YieOD/KMYNA3ZnL1K6TkwoBHOg7UgYuCyDi90REY1wwV7xNRgr9l4oGPcXihWCvUUT0P/eSZE67ieQ7292JxERjWA93R+rMQVnfMcaK9NwwrZ62L4AjSYTRl16LTrMWUhIHTusBVOwuncOVVf0tKEz0La2ukJZ11ZeYTH2A7Dan0TWBffXpKWjfoSsE8MihohohPL+S14D/L58M8UZZFSVYD+g/IswmEXThU8qLry/DHFa+f2FaqBtKAYuh5ORcZdEROTH7XLhUvs/QUP/K+hCAJfa/wnuWxYp7/4IRtGkt46KR3y1joqi+wvlQNtgDlwONxzYS0Q0An1k/yNS0TboCrqpaMNH9j8qaU+vaNLwVdGkaHG2YK+jwoG2ocEihohoBGo79GelOT3BLpqC3b0T7SsE9xYum05Gx+8mEREFRHjkvnRkc3raDr4ln/u7BRfdXii6d0bKQNtw2nSSRQwR0QikdTqV5vSYzp5QmtMzKb8QLZXJMIv+n/4IAbRoKcq7d6J9oG0wB0vLCLg7adeuXbjjjjtgtVqhaRp27Njh9/5rr72GwsJCjBkzBpqmweFw9DlHV1cXfvSjH2HMmDFISkrCggUL8OWXX/plmpubUVRUBLPZDLPZjKKiIrS0tAR6uURE1I/YzpNKc3rOp1iV5mToL4I2PMukeQfaXjf/QUy58fao6kIKt00nAy5i2tvbMXXqVJSWlg74/o033oinn356wHMsW7YM27dvxyuvvIJ3330XbW1tmD9/Ptxuty+zcOFCOBwOlJeXo7y8HA6HA0VFRYFeLhER9cOtJSjN6UmZfKvSnJ5D1RVSY3CifYNElcJx08mAy8N58+Zh3rx5A77vLTQ+++yzft93Op34zW9+g9/+9re49dae/1m3bNmCcePG4a233kJhYSEOHjyI8vJy7N69G/n5+QCAl156CTabDYcPH8bEiRMDvWwiIrqAOyZJaU7PxOvnwP1Wz7+cB+re8UDDxOvnKGmPGySqF46/p0GfnbRv3z6cP38ec+Z8/T+q1WrF1VdfjaqqKgCA3W6H2Wz2FTAAMGPGDJjNZl+mt66uLrS2tvr9EBFR/+K6T+uHAsjpObrvTzBq/RcwQM9xoyZwdN+flLTHDRLVC8ff06AXMQ0NDYiNjUVqaqrf8aysLDQ0NPgymZmZfX5tZmamL9Pb2rVrfeNnzGYzxo0bp/7iiYiihNsk+SRGMqfn3KkvlOb0cN0W9cLx9zRs1okRQkC7oETX+inXe2cutHLlSjidTt/PsWPHhu1aiYginWnaPUpzelxtp5Tm9BhNJnxqmQsNPV1VFxKiZ2JNnWVu1Ay6DYZwXAsn6EWMxWJBd3c3mpub/Y43NTUhKyvLl2lsbOzza0+ePOnL9BYXF4dRo0b5/RARUf86TxxSmtNjTEpTmtPjdrlwWUM5gAG2VQCQ01AeskXaIlVeYTE+KHgeJ7V0v+NNWnpIds0OehEzffp0xMTEoLKy0nesvr4eH374IQoKCgAANpsNTqcTNTU1vkx1dTWcTqcvQ0QUrbo7O2Evewq7S38Ae9lT6O7sVN6G5vxcaU6Pu/2MfiiAnB7vTJrBxuAEeyZNtMgrLMaYJ46gdvZW7L3uWdTO3oqMJ46EZDG/gJ/5tLW14eOPP/a9rqurg8PhQFpaGsaPH48zZ87giy++wIkTPQsWHT58GEDP0xWLxQKz2YwHHngAP/nJT5Ceno60tDT89Kc/xTXXXOObrTR58mTMnTsXixcvxosvvggAePDBBzF//nzOTCKiqGbfuBQ31G+FTfvq+fwpwL32V7BnL4TtoQ3K2hGpOYBEz41IzVHSnkHyCYtsTk84zqSJJuGy6WTAT2L27t2LvLw85OXlAQAeeeQR5OXl4Z//+Z8BADt37kReXh5uv73n5r773e8iLy8PGzdu9J1j/fr1uPPOO/Gd73wHN954IxITE/GHP/wBRqPRlykrK8M111yDOXPmYM6cOfjGN76B3/72txd1s0RE4cy+cSlm1JfB0GsRNgMEZtSXwb5xqbK2pn37YQjRd7yIl/e9ad9+WEl7HsknLLI5PeE4k4bU04QY6H/hyNba2gqz2Qyn08nxMUQU9ro7O2Fam9XvLs/AV0UFANfKRsTGx190e7XvvYEplQv1c7O3KvkX957//wVcv/8x/Vze07j+2//fRbfndrlw6hdXIkP0vzibR/SM48h44ggH94aZQL6/w2Z2EhHRSLbvf9bCoLOOikHryakQ7O6WxDHjleb0hONMGlKPRQwRURgwH3tTaU5PsLtbcqffArdO95VbaMidfouS9oDwm0lD6rEEJSIKBwOtIDbUnI5J+YVorEzX7W5RtXDZ4T1v4poBnjIBX63YC4GP9ryJa/5ugZI2gejfVXqk458iEVEYcKbkAi0H5XIKeLtbMqpK4BH
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"297.190125pt\" version=\"1.1\" viewBox=\"0 0 403.97 297.190125\" width=\"403.97pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-16T23:51:14.572397</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 297.190125 \r\nL 403.97 297.190125 \r\nL 403.97 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 39.65 273.312 \r\nL 396.77 273.312 \r\nL 396.77 7.2 \r\nL 39.65 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"PathCollection_1\">\r\n <defs>\r\n <path d=\"M 0 3 \r\nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \r\nC 2.683901 1.55874 3 0.795609 3 0 \r\nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \r\nC 1.55874 -2.683901 0.795609 -3 0 -3 \r\nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \r\nC -2.683901 -1.55874 -3 -0.795609 -3 0 \r\nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \r\nC -1.55874 2.683901 -0.795609 3 0 3 \r\nz\r\n\" id=\"md36af8a450\" style=\"stroke:#1f77b4;\"/>\r\n </defs>\r\n <g clip-path=\"url(#pd576a8acea)\">\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#md36af8a450\" y=\"86.048884\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#md36af8a450\" y=\"124.213777\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"157.337273\" xlink:href=\"#md36af8a450\" y=\"133.152276\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"157.337273\" xlink:href=\"#md36af8a450\" y=\"153.220159\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#md36af8a450\" y=\"122.777038\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"96.464545\" xlink:href=\"#md36af8a450\" y=\"192.374315\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"96.464545\" xlink:href=\"#md36af8a450\" y=\"189.793391\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"137.046364\" xlink:href=\"#md36af8a450\" y=\"178.61659\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"238.500909\" xlink:href=\"#md36af8a450\" y=\"96.083388\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"137.046364\" xlink:href=\"#md36af8a450\" y=\"183.97551\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#md36af8a450\" y=\"126.088432\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#md36af8a450\" y=\"123.135572\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#md36af8a450\" y=\"124.295439\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"197.919091\" xlink:href=\"#md36af8a450\" y=\"102.385215\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"96.464545\" xlink:href=\"#md36af8a450\" y=\"194.289993\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"116.755455\" xlink:href=\"#md36af8a450\" y=\"192.503652\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"157.337273\" xlink:href=\"#md36af8a450\" y=\"151.803069\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"177.628182\" xlink:href=\"#md36af8a450\" y=\"104.833254\"/>\r\n <use style=\"fill:#1f77b4;str
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.scatter(heights, salaries)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"> Can you guess why the dots line up into vertical lines like this?\n",
"\n",
"We have observed the correlation between artificially engineered concept like salary and the observed variable *height*. Let's also see if the two observed variables, such as height and weight, also correlate:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 111,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., nan],\n",
" [nan, nan]])"
]
},
3 years ago
"execution_count": 111,
"metadata": {},
3 years ago
"output_type": "execute_result"
}
],
3 years ago
"source": [
"np.corrcoef(df['Height'],df['Weight'])"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"Unfortunately, we did not get any results - only some strange `nan` values. This is due to the fact that some of the values in our series are undefined, represented as `nan`, which causes the result of the operation to be undefined as well. By looking at the matrix we can see that `Weight` is problematic column, because self-correlation between `Height` values has been computed.\n",
"\n",
"> This example shows the importance of **data preparation** and **cleaning**. Without proper data we cannot compute anything.\n",
"\n",
"Let's use `fillna` method to fill the missing values, and compute the correlation: "
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 114,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0.52959196],\n",
" [0.52959196, 1. ]])"
]
},
3 years ago
"execution_count": 114,
"metadata": {},
3 years ago
"output_type": "execute_result"
}
],
3 years ago
"source": [
"np.corrcoef(df['Height'],df['Weight'].fillna(method='pad'))"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
"The is indeed a correlation, but not such a strong one as in our artificial example. Indeed, if we look at the scatter plot of one value against the other, the relation would be much less obvious:"
3 years ago
]
},
{
"cell_type": "code",
"execution_count": 117,
3 years ago
"metadata": {},
"outputs": [
{
"data": {
3 years ago
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAGwCAYAAABPSaTdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAA9hAAAPYQGoP6dpAABpTUlEQVR4nO3deXhUVbou8LcyDyQFCYQQEiCigCFM0mCDYVJmbUTPPaI2Ii0HBwgyN+Jw1bYhDq04cEFNKxyhEby3QVARDQIBBIQDpAFBJgMIJB1JICFkrtr3j3SlqSSVvYqsWrtq5/09T55Hqz5q7VXDrq/WXmt9Fk3TNBARERGZlJ/RB0BERETkSUx2iIiIyNSY7BAREZGpMdkhIiIiU2OyQ0RERKbGZIeIiIhMjckOERERmVqA0QfgDex2Oy5evIiIiAhYLBajD4eIiIgEaJqGq1evIi4uDn5+rsdvmOwAuHjxIhISEow+DCIiIroBv/zyC+Lj413ez2QHQEREBIDqJysyMtLgoyEiIiIRRUVFSEhIqPked4XJDlBz6SoyMpLJDhERkY/Rm4LCCcpERERkakx2iIiIyNSY7BAREZGpMdkhIiIiU2OyQ0RERKbGZIeIiIhMjckOERERmRqTHSIiIjI1JjtERERkatxBmYiIvJbNrmFvdgHyrpYhJiIEfROj4O/Hgs3kHiY7RETklTYdycHLXxxFTmFZzW1trCF48XdJGJncxsAjI1/Dy1hEROR1Nh3JwVMrDzglOgCQW1iGp1YewKYjOQYdGfkiJjtERORVbHYNL39xFFo99zlue/mLo7DZ64sgqovJDhEReZW92QV1RnSupwHIKSzD3uwCdQdFPo3JDhEReZW8q64TnRuJI2KyQ0REXiUmIkRqHBGTHSIi8ip9E6PQxhoCVwvMLaheldU3MUrlYZEPY7JDRERexd/Pghd/lwQAdRIex/+/+Lsk7rdDwpjsEBGR1xmZ3AZLx9+GWKvzpapYawiWjr+N++yQW7ipIBEReaWRyW0wLCmWOyhTozHZISIir+XvZ0G/jtFGHwb5OF7GIiIiIlNjskNERESmxmSHiIiITI3JDhEREZkakx0iIiIyNSY7REREZGpMdoiIiMjUmOwQERGRqTHZISIiIlNjskNERESmxmSHiIiITI3JDhEREZkakx0iIiIyNUOTnbS0NPTp0wcRERGIiYnB2LFjcfz4caeY4uJipKamIj4+HqGhobj11luxdOlSp5jy8nJMmzYNLVu2RHh4OMaMGYPz58+r7AoRERF5KUOTnczMTEydOhV79uxBRkYGqqqqMHz4cFy7dq0mZubMmdi0aRNWrlyJY8eOYebMmZg2bRrWr19fEzNjxgysW7cOq1evxs6dO1FcXIx77rkHNpvNiG4RERGRF7FomqYZfRAOv/76K2JiYpCZmYmBAwcCAJKTkzFu3Di88MILNXG9e/fG6NGj8corr6CwsBCtWrXCihUrMG7cOADAxYsXkZCQgI0bN2LEiBG67RYVFcFqtaKwsBCRkZGe6RwRERFJJfr97VVzdgoLCwEAUVFRNbelpKRgw4YNuHDhAjRNw9atW3HixImaJGb//v2orKzE8OHDa/5NXFwckpOTsWvXrnrbKS8vR1FRkdMfERERmZPXJDuapmHWrFlISUlBcnJyze3vvvsukpKSEB8fj6CgIIwcORJLlixBSkoKACA3NxdBQUFo0aKF0+O1bt0aubm59baVlpYGq9Va85eQkOC5jhGRYWx2DbtP52N91gXsPp0Pm91rBrKJSKEAow/AITU1FYcOHcLOnTudbn/33XexZ88ebNiwAe3bt8f27dsxZcoUtGnTBkOHDnX5eJqmwWKx1Hvf/PnzMWvWrJr/LyoqYsJDZDKbjuTg5S+OIqewrOa2NtYQvPi7JIxMbmPgkRGRal6R7EybNg0bNmzA9u3bER8fX3N7aWkpnn32Waxbtw533303AKB79+7IysrCX/7yFwwdOhSxsbGoqKjA5cuXnUZ38vLy0L9//3rbCw4ORnBwsGc7RUSG2XQkB0+tPIDa4zi5hWV4auUBLB1/GxMeoibE0MtYmqYhNTUVa9euxZYtW5CYmOh0f2VlJSorK+Hn53yY/v7+sNvtAKonKwcGBiIjI6Pm/pycHBw5csRlskNE5mWza3j5i6N1Eh0ANbe9/MVRXtIiakIMHdmZOnUqVq1ahfXr1yMiIqJmjo3VakVoaCgiIyMxaNAgzJ07F6GhoWjfvj0yMzPxySef4K233qqJnTRpEmbPno3o6GhERUVhzpw56NatW4OXuYjInPZmFzhduqpNA5BTWIa92QXo1zFa3YERkWEMTXYcmwMOHjzY6fZly5Zh4sSJAIDVq1dj/vz5+P3vf4+CggK0b98eCxYswJNPPlkTv2jRIgQEBOCBBx5AaWkp7rrrLixfvhz+/v6qukJEXiLvqutE50biiMj3edU+O0bhPjtE5rH7dD4eSt+jG/fp5N9yZIfIx4l+f3vFBGUiIln6JkahjTUEuYVl9c7bsQCItYagb2JUPff6Fptdw97sAuRdLUNMRHWf/P3qX4VK1JQx2SEiU/H3s+DF3yXhqZUHYAGcEh5HGvDi75J8Ping0noicV6zqSARkSwjk9tg6fjbEGsNcbo91hpiimXnjqX1tSdiO5bWbzqSY9CREXknjuwQkSmNTG6DYUmxprvMo7e03oLqpfXDkmJ9vq9EsjDZISLT8vezmG4SMpfWE7mPl7GIiHwIl9YTuY/JDhGRD4mJCNEPciOOqClgskNE5EMcS+tdzcaxoHpVlhmW1hPJwmSHiMiHOJbWA6iT8JhpaT2RTEx2iIh8jNmX1hPJxtVYREQ+yKxL64k8gckOEZGPMuPSeiJPYLJDRKbF2lFEBDDZISKTYu0oInLgBGUiMh3WjiKi6zHZISJT0asdBVTXjrLZ64sgIjNiskNEpuJO7SgiahqY7BCRqbB2FBHVxmSHiEyFtaOIqDYmO0RkKqwdRUS1MdkhIlNh7Sgiqo3JDhGZDmtHEdH1uKkgEZkSa0cRkQOTHSIyLdaOIiKAl7GIiIjI5JjsEBERkakx2SEiIiJTY7JDREREpsZkh4iIiEyNyQ4RERGZGpMdIiIiMjUmO0RERGRqTHaIiIjI1JjsEBERkakx2SEiIiJTY7JDREREpsZCoERewmbXWKHbx/E1JHeZ/T3jLf0zNNlJS0vD2rVr8dNPPyE0NBT9+/fHa6+9hs6dOzvFHTt2DPPmzUNmZibsdju6du2Kzz77DO3atQMAlJeXY86cOfj0009RWlqKu+66C0uWLEF8fLwR3SJy26YjOXj5i6PIKSyrua2NNQQv/i4JI5PbGHhkJIqvIbnL7O8Zb+qfoZexMjMzMXXqVOzZswcZGRmoqqrC8OHDce3atZqY06dPIyUlBV26dMG2bdvwj3/8Ay+88AJCQkJqYmbMmIF169Zh9erV2LlzJ4qLi3HPPffAZrMZ0S0it2w6koOnVh5wOiEAQG5hGZ5aeQCbjuQYdGQkiq8hucvs7xlv659F0zRNaYsN+PXXXxETE4PMzEwMHDgQAPDggw8iMDAQK1asqPffFBYWolWrVlixYgXGjRsHALh48SISEhKwceNGjBgxQrfdoqIiWK1WFBYWIjIyUl6HiHTY7BpSXttS54TgYAEQaw3Bznl3mmpo20z4GpK7zP6eUdk/0e9vr5qgXFhYCACIiooCANjtdnz11Vfo1KkTRowYgZiYGNx+++34/PPPa/7N/v37UVlZieHDh9fcFhcXh+TkZOzatavedsrLy1FUVOT0R2SEvdkFLk8IAKAByCksw97sAnUHRW7ha0juMvt7xhv75zXJjqZpmDVrFlJSUpCcnAwAyMvLQ3FxMV599VWMHDkS3377Le677z7cf//9yMzMBADk5uYiKCgILVq0cHq81q1bIzc3t9620tLSYLVaa/4SEhI82zkiF/Kuuj4h3EgcqcfXkNxl9veMN/bPa1Zjpaam4tChQ9i5c2fNbXa7HQBw7733YubMmQCAnj17YteuXXj//fcxaNAgl4+naRoslvqHx+bPn49Zs2b
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<svg height=\"310.86825pt\" version=\"1.1\" viewBox=\"0 0 411.285625 310.86825\" width=\"411.285625pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-08-17T10:55:11.713192</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.4.2, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 310.86825 \r\nL 411.285625 310.86825 \r\nL 411.285625 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 46.965625 273.312 \r\nL 404.085625 273.312 \r\nL 404.085625 7.2 \r\nL 46.965625 7.2 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"PathCollection_1\">\r\n <defs>\r\n <path d=\"M 0 3 \r\nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \r\nC 2.683901 1.55874 3 0.795609 3 0 \r\nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \r\nC 1.55874 -2.683901 0.795609 -3 0 -3 \r\nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \r\nC -2.683901 -1.55874 -3 -0.795609 -3 0 \r\nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \r\nC -1.55874 2.683901 -0.795609 3 0 3 \r\nz\r\n\" id=\"ma7beb97fee\" style=\"stroke:#1f77b4;\"/>\r\n </defs>\r\n <g clip-path=\"url(#p9d990bce6f)\">\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"205.234716\" xlink:href=\"#ma7beb97fee\" y=\"209.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"205.234716\" xlink:href=\"#ma7beb97fee\" y=\"148.896\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"164.652898\" xlink:href=\"#ma7beb97fee\" y=\"157.536\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"164.652898\" xlink:href=\"#ma7beb97fee\" y=\"157.536\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"184.943807\" xlink:href=\"#ma7beb97fee\" y=\"195.552\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"103.78017\" xlink:href=\"#ma7beb97fee\" y=\"216.288\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"103.78017\" xlink:href=\"#ma7beb97fee\" y=\"159.264\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"144.361989\" xlink:href=\"#ma7beb97fee\" y=\"174.816\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"245.816534\" xlink:href=\"#ma7beb97fee\" y=\"121.248\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"144.361989\" xlink:href=\"#ma7beb97fee\" y=\"209.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"184.943807\" xlink:href=\"#ma7beb97fee\" y=\"195.552\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"184.943807\" xlink:href=\"#ma7beb97fee\" y=\"209.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"205.234716\" xlink:href=\"#ma7beb97fee\" y=\"200.736\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"205.234716\" xlink:href=\"#ma7beb97fee\" y=\"243.936\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"103.78017\" xlink:href=\"#ma7beb97fee\" y=\"209.376\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"124.07108\" xlink:href=\"#ma7beb97fee\" y=\"200.736\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"164.652898\" xlink:href=\"#ma7beb97fee\" y=\"180\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"184.943807\" xlink:href=\"#ma7beb97fee\" y=\"193.824\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"225.5256
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
3 years ago
]
},
3 years ago
"metadata": {},
"output_type": "display_data"
}
],
3 years ago
"source": [
"plt.scatter(df['Height'],df['Weight'])\n",
"plt.xlabel('Height')\n",
"plt.ylabel('Weight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
3 years ago
"metadata": {},
"source": [
3 years ago
"## Conclusion\n",
"\n",
"In this notebook, we have learnt how to perform basic operations on data to compute statistical functions. We now know how to use sound apparatus of math and statistics in order to prove some hypotheses, and how to compute confidence intervals for random variable given data sample. "
3 years ago
]
}
],
"metadata": {
3 years ago
"interpreter": {
"hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
},
"kernelspec": {
"display_name": "Python 3.8.8 64-bit (conda)",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
3 years ago
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
3 years ago
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
3 years ago
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
3 years ago
}