diff --git a/2-Regression/1-Tools/Picture1.png b/2-Regression/1-Tools/Picture1.png new file mode 100644 index 00000000..b891ae77 Binary files /dev/null and b/2-Regression/1-Tools/Picture1.png differ diff --git a/2-Regression/1-Tools/assignment.ipynb b/2-Regression/1-Tools/assignment.ipynb new file mode 100644 index 00000000..2d550fb5 --- /dev/null +++ b/2-Regression/1-Tools/assignment.ipynb @@ -0,0 +1,675 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 2.1 - Assignment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "OK, so we're using the Linnerud dataset to learn something about fitness of middle-aged men. Might even be useful :)\n", + "\n", + "Looking for the relationship between waistline and number of situps. So number of situps is the 'feature' and waistline is the 'target' (not in the usual sense of a target waistline 😁)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Order of tasks\n", + "\n", + "Let's see if I can write out what we're going to do from memory (no way I'm going to remember the actual code)\n", + "\n", + "1. Load the dataset from the SciKit library, with X and y separated. According to the assignment text, there might be more than one y here, so it could get tricky.\n", + "1. Split the dataset into train and test data, 2/3-1/3. Using the sampling tools module of scikit (can't remember the name)\n", + "1. Initialise a new LinearRegression model (name might be wrong?)\n", + "1. Train the model (`model.fit()`) using the training data\n", + "1. Predict y for the test X (`model.predict`, I think?)\n", + "1. Plot it using matplotlib (is that the right name?). Plot a line for the predicted y and dots for the actual y" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ok, let's go" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "from sklearn import datasets, linear_model, model_selection\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(had to look these up)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'tuple' object is not callable", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[1;32m~\\AppData\\Local\\Temp/ipykernel_22452/1864794356.py\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mX\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdatasets\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mload_linnerud\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mreturn_X_y\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mX\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 3\u001b[0m \u001b[0my\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", + "\u001b[1;31mTypeError\u001b[0m: 'tuple' object is not callable" + ] + } + ], + "source": [ + "X, y = datasets.load_linnerud(return_X_y = True)\n", + "X.shape()\n", + "y.shape()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "hmm, got that wrong. need to look it up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(20, 3)\n", + "(20, 3)\n" + ] + } + ], + "source": [ + "print(X.shape)\n", + "print(y.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I guess `shape` is a property, not a function.\n", + "\n", + "Don't know if the `print` is needed or not." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(20, 3)" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X.shape\n", + "y.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "ah, without `print` it only outputs the last line's output. Veiter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "OK, so both X and y have 3 dimensions, we'll need to reshape both.\n", + "\n", + "According to the [docs](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset),\n", + "\n", + "X = chins, situps, jumps\n", + "\n", + "y = weight, waist, pulse\n", + "\n", + "\n", + "So we need to get the second one from each." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X_situps = X[:, np.newaxis, 1]\n", + "y_waist = y[:, np.newaxis, 1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's make sure we took the right values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 5. 162. 60.] [191. 36. 50.]\n", + "[162.] [36.]\n" + ] + } + ], + "source": [ + "print(X[0], y[0])\n", + "print(X_situps[0], y_waist[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Great, we took the middle one from each." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Split into training and test data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X_train, X_test, y_train, y_test = model_selection.train_test_split(X_situps, y_waist)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This was from memory (with help from autocomplete), let's see if I got the order of the return values right... yup looks good" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialise the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = linear_model.LinearRegression()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "LinearRegression()" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.fit(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Predict test data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y_predict = model.predict(y_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plot x_test against y_test and y_predict\n", + "\n", + "I need to look this one up 😳" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAASA0lEQVR4nO3dfXBld13H8ff37m53G6G00ACVbXIrBVER+hA7HXmSxUopDMUZHdGMwgBmrKNCRXkwjk47BK04oDg8TLQ41b0CBVrsdChQmCowY7eTLd3S0gJ12Cx9wC7KAp3QbZN8/eOe3WbTPNxkk3vuL3m/Zs7knN85N7/vnM3vk98959xsZCaSpPI06i5AkrQ6BrgkFcoAl6RCGeCSVCgDXJIKtbWbnZ166qnZbDa72aUkFW/v3r3fy8z++e1dDfBms8nExEQ3u5Sk4kXE5ELtXkKRpEIZ4JJUKANckgrVcYBHxJaI+GpEXF9tnxEReyLinoj4eEScsH5lSpLmW8kM/M3AXXO2rwDel5lnAt8H3riWhUmSltZRgEfETuCVwD9V2wHsAj5ZHXIV8Jp1qE/SJtVqQbMJjUb7a6tVd0W9p9PHCP8OeBvwxGr7KcChzJyutu8FnrHQCyNiBBgBGBgYWHWhkjaPVgtGRmBqqr09OdneBhgerq+uXrNsgEfEq4AHM3NvRPzSSjvIzHFgHGBoaMi/XdtFrRaMjsKBAzAwAGNj/vBvFJlJkszm7NFlZnbm2O08druTY+bun8kZHp15lMMzh3lk5pFFl8PTj+1/dPbRY9tmV/GamUc49KNHyLc+AlsehS+/E774bqam2j/Pq/0ZrmM8rHefnczAXwC8OiIuAnYAJwF/D5wcEVurWfhO4L61K0vHa7EZTGby2t9afNB2MrA7PeZ4A2Q9+1nqmOnZ6cXDak6YLRRu07PTS//DqDPb5qy/6K/gi+8G2kG4GnXM6LvRZ6zkP3SoZuB/kpmviohPAJ/KzI9FxIeB2zPzg0u9fmhoKFf6SUxnkavTbMLk/VPw+z8HJ90LMQuN2brLknrD9Pajq9u3L3HccTh8eF7Dh/bB//40g4Owf//KvldE7M3Mofntx/NR+rcDH4uIdwFfBa48ju+1oG78BstMpmenj1lmcuZxbUf3zS6xb71elyv7njM5w+QrpmHrYThl/9qcKGmj+NHTYd/rjm6+5e3r080VV8xrePgUYPXvIhayohn48VrpDLzZbIc2Z94A51wJjWloTLOjb5oXvGj5AOwkNGfTWelqBEEjGjSiwZbGlqPr85ctsfC+1bzmuPuiR+ro4b4a0aD9kFn91vLd99EsmWc1s+E6+lyPGfi6O/qb6on3w6l3QW6BbPBwNjj08LE/dDu27libwUE5g22p133us1t41+UNHv5xA7K97NjR4F2XN/jV1yxSR4d9BdEzg1wb1/Dw2r3THhs79t08QF9fu329dKXPzOzacu655+ZKDA5mwuOXwcEVfZtNa/fu9rmKaH/dvbvuiqT61DEe1qpPYCIXyNSevoQy/xo4tH+DjY97I1PS5rHYJZSe/mNWw8PtsB4chIj2V8Nbktp6+ho4rO11MEnaSHp6Bi5JWpwBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUqGUDPCJ2RMQtEbEvIu6MiMuq9pdFxK0RcVtEfCUizlz/ciVJR3QyAz8M7MrM5wNnARdGxPnAh4DhzDwL+Dfgz9erSEnS421d7oDMTOChanNbtWS1nFS1Pwm4fz0KlCQtbNkAB4iILcBe4EzgA5m5JyLeBHwmIn4M/BA4f5HXjgAjAAMDA2tStCSpw5uYmTlTXSrZCZwXEc8FLgUuysydwD8D713kteOZOZSZQ/39/WtUtiRpRU+hZOYh4CbgFcDzM3NPtevjwC+ubWmSpKV08hRKf0ScXK2fCFwA3AU8KSKeXR12pE2S1CWdXAM/Dbiqug7eAK7OzOsj4neBT0XELPB94A3rWKckaZ5OnkK5HTh7gfZrgWvXoyhJ0vL8JKYkFcoAl6RCGeCSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCrVsgEfEjoi4JSL2RcSdEXFZ1R4RMRYR34yIuyLij9ajwFarRbPZpNFo0Gw2abVa69GNJBVnawfHHAZ2ZeZDEbEN+EpE3AD8DHA68JzMnI2Ip651ca1Wi5GREaampgCYnJxkZGQEgOHh4bXuTpKKsuwMPNseqja3VUsClwCXZ+ZsddyDa13c6Ojo0fA+YmpqitHR0bXuSpKK09E18IjYEhG3AQ8CN2bmHuCZwG9ExERE3BARz1rktSPVMRMHDx5cUXEHDhxYUbskbSYdBXhmzmTmWcBO4LyIeC6wHXg4M4eAfwQ+sshrxzNzKDOH+vv7V1TcwMDAitolaTNZ0VMomXkIuAm4ELgXuKbadS3wvDWtDBgbG6Ovr++Ytr6+PsbGxta6K0kqTidPofRHxMnV+onABcDdwKeBl1aHvQT45loXNzw8zPj4OIODg0QEg4ODjI+PewNTkoDIzKUPiHgecBWwhXbgX52Zl1eh3gIGgIeA38vMfUt9r6GhoZyYmFiLuiVp04iIvdXl6mMs+xhhZt4OnL1A+yHglWtSnSRpxfwkpiQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVCgDXJIKZYBLUqEMcEkqlAEuSYUywCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVKhlAzwidkTELRGxLyLujIjL5u1/f0Q8tH4lSpIWsrWDYw4DuzLzoYjYBnwlIm7IzJsjYgg4ZX1LlCQtZNkZeLYdmWFvq5aMiC3Ae4C3rWN9kqRFdHQNPCK2RMRtwIPAjZm5B/gD4LrMfGCZ145ExERETBw8ePC4C5YktXUU4Jk5k5lnATuB8yLixcCvA//QwWvHM3MoM4f6+/uPq1hJ0mNW9BRKZh4CbgJeCpwJ3BMR+4G+iLhnzauTJC2qk6dQ+iPi5Gr9ROACYG9mPj0zm5nZBKYy88x1rVSSdIxOnkI5DbiqumnZAK7OzOvXtyxJ0nKWDfDMvB04e5ljnrBmFUmSOuInMSWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgG9grVaLZrNJo9Gg2WzSarXqLkmqTR3jYd37zMyuLeeee26qO3bv3p19fX0JHF36+vpy9+7ddZcmdV0d42Et+wQmcoFMjfa+7hgaGsqJiYmu9beZNZtNJicnH9c+ODjI/v37u1+QVKM6xsNa9hkRezNz6HHtBvjG1Gg0WOjfNiKYnZ2toSKpPnWMh7Xsc7EA9xr4BjUwMLCidmkjq2M8dKNPA3yDGhsbo6+v75i2vr4+xsbGaqpIqk8d46ErfS50YXy9Fm9idtfu3btzcHAwIyIHBwe9galNrY7xsFZ94k1MSSqT18AlaYMxwCWpUAa4JBXKAJekQhngklQoA1ySCmWAS1KhDHBJKpQBLkmFMsAlqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoZYN8IjYERG3RMS+iLgzIi6r2lsR8Y2IuCMiPhIR29a/XEnSEZ3MwA8DuzLz+cBZwIURcT7QAp4D/DxwIvCm9SpSkvR4W5c7oPrvfB6qNrdVS2bmZ44cExG3ADvXpUJJ0oI6ugYeEVsi4jbgQeDGzNwzZ9824LeBzy7y2pGImIiIiYMHD65ByZIk6DDAM3MmM8+iPcs+LyKeO2f3B4EvZeaXF3nteGYOZeZQf3//cRcsSWpb0VMomXkIuAm4ECAi/hLoB/54zSuTJC2pk6dQ+iPi5Gr9ROAC4O6IeBPwcuA3M3N2XauUJD3OsjcxgdOAqyJiC+3Avzozr4+IaWAS+K+IALgmMy9fv1IlSXN18hTK7cDZC7R3Ev6SpHXiJzElqVAGuCQVygCXpEIZ4JJUKANckgplgEtSoQxwSSqUAS5JhTLAJalQBrgkFcoAl6RCGeCSVCgDXJIKZYBL6kmtVotms0mj0aDZbNJqteouqef4J2El9ZxWq8XIyAhTU1MATE5OMjIyAsDw8HCdpfUUZ+CSes7o6OjR8D5iamqK0dHRmirqTQa4pJ5z4MCBFbVvVga4pJ4zMDCwovbNygCX1HPGxsbo6+s7pq2vr4+xsbGaKupNBriknjM8PMz4+DiDg4NEBIODg4yPj3sDc57IzK51NjQ0lBMTE13rT5I2gojYm5lD89udgUtSoQxwSSqUAS5JhTLAJalQBrgkFaqrT6FExEFgsmsdLu1U4Ht1F7EIa1udXq4Ners+a1u9btQ3mJn98xu7GuC9JCImFnospxdY2+r0cm3Q2/VZ2+rVWZ+XUCSpUAa4JBVqMwf4eN0FLMHaVqeXa4Pers/aVq+2+jbtNXBJKt1mnoFLUtEMcEkq1KYI8Ii4NCLujIg7IuKjEbEjIs6IiD0RcU9EfDwiTuhiPR+JiAcj4o45bU+OiBsj4lvV11Oq9oiI91d13h4R59RQ23si4u6q/2sj4uQ5+95Z1faNiHh5t2ubs++tEZERcWq1Xft5q9r/sDp3d0bE38xp79p5W6y+iDgrIm6OiNsiYiIizqvau33uTo+ImyLi69V5enPVXvuYWKK2nhgTZOaGXoBnAN8GTqy2rwZeX319bdX2YeCSLtb0YuAc4I45bX8DvKNafwdwRbV+EXADEMD5wJ4aavsVYGu1fsWc2n4W2AdsB84A/hvY0s3aqvbTgc/R/pDYqT103l4KfAHYXm0/tY7ztkR9nwdeMed8/UdN5+404Jxq/YnAN6tzVPuYWKK2nhgTm2IGDmwFToyIrUAf8ACwC/hktf8q4DXdKiYzvwT837zmi6s65tdzMfAv2XYzcHJEnNbN2jLz85k5XW3eDOycU9vHMvNwZn4buAc4r5u1Vd4HvA2Ye0e+9vMGXAL8dWYero55cE5tXTtvS9SXwEnV+pOA++fU181z90Bm3lqt/wi4i/bEq/YxsVhtvTImNnyAZ+Z9wN8CB2gH9w+AvcChOf8A99L+ganT0zLzgWr9u8DTqvVnAN+Zc1zdtb6B9uwHeqC2iLgYuC8z983bVXttwLOBF1WX6v4zIn6hh2oDeAvwnoj4Du0x8s6qvbb6IqIJnA3socfGxLza5qptTGz4AK+um11M++3MTwI/AVxYa1HLyPZ7sZ57vjMiRoFpoFV3LQAR0Qf8GfAXddeyiK3Ak2m/zf9T4OqIiHpLOsYlwKWZeTpwKXBlncVExBOATwFvycwfzt1X95hYrLa6x8SGD3Dgl4FvZ+bBzHwUuAZ4Ae23XVurY3YC99VVYOV/jrwNrL4eebt9H+1rvEfUUmtEvB54FTBcDSaov7Zn0v7FvC8i9lf93xoRT++B2qA9+7qmeqt/CzBL+w8f9UJtAK+jPR4APsFjb/W7Xl9EbKMdkK3MPFJTT4yJRWrriTGxGQL8AHB+RPRVs5+XAV8HbgJ+rTrmdcC/11TfEddVdcCx9VwH/E515/184Adz3lZ2RURcSPsa86szc2rOruuA10bE9og4A3gWcEu36srMr2XmUzOzmZlN2oF5TmZ+lx44b8Cnad/IJCKeDZxA+6/W1Xre5rgfeEm1vgv4VrXe1XNXjcsrgbsy871zdtU+JharrWfGxHrdHe2lBbgMuBu4A/hX2neIf6o6sffQnn1s72I9H6V9Pf5R2qHzRuApwBdpD6IvAE+ujg3gA7TvZn8NGKqhtntoX9e7rVo+POf40aq2b1A90dDN2ubt389jT6H0wnk7Adhd/dzdCuyq47wtUd8Lad8P2kf7uu65NZ27F9K+PHL7nJ+xi3phTCxRW0+MCT9KL0mF2gyXUCRpQzLAJalQBrgkFcoAl6RCGeCSVCgDXJIKZYBLUqH+H8kanzFb2NarAAAAAElFTkSuQmCC", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.scatter(X_test, y_test, color='black')\n", + "plt.plot(X_test, y_predict, color='green')\n", + "plt.scatter(X_test, y_predict, color='blue')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Something weird here. Why are there only 5 data points? The original dataset had 20, I think.\n", + "Also, the results look totally wrong. They're all way above what they should be" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "orig (20, 1) (20, 1)\n", + "trai (15, 1) (15, 1)\n", + "test (5, 1) (5, 1)\n" + ] + } + ], + "source": [ + "print('orig', X_situps.shape, y_waist.shape) #lol\n", + "print('trai', X_train.shape, y_train.shape)\n", + "print('test', X_test.shape, y_test.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "15 train, 5 test.\n", + "\n", + "Ah, maybe I needed to tell it how to split it better. Let's start over" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVdElEQVR4nO3de5BkZX3G8e8zF9gZZVmUAQnDTCOoxCgCmaKo4KXEoAiI5mKEGok3nKgQMVZCpCZBxZqqECqGaKLWgJQbt72QCEpRYokBolQpW72wILhcVt1ZQWRGdAPL7K47O7/80Wd2e3u7p3t2+vayz6eqq0+/55w5vz19+tm33z59WhGBmZmlp6vdBZiZ2f5xgJuZJcoBbmaWKAe4mVmiHOBmZonqaeXGDj/88Mjlcq3cpJlZ8tatW/friBgob29pgOdyOQqFQis3aWaWPElTldo9hGJmligHuJlZohzgZmaJcoCbmSXKAW5mlqi6A1xSt6R7Jd2SPZakCUmPSNog6cPNKDCfz5PL5ejq6iKXy5HP55uxGTOz5CzlNMJLgQ3Ayuzxu4FjgBMiYl7SEQ2ujXw+z9jYGLOzswBMTU0xNjYGwOjoaKM3Z2aWlLp64JIGgXOA60qaPwhcGRHzABEx3ejixsfHd4f3gtnZWcbHxxu9KTOz5NQ7hHINcBkwX9J2HPAOSQVJt0p6SaUVJY1lyxRmZmaWVNzmzZuX1G5mdiCpGeCSzgWmI2Jd2ayDge0RMQJcC1xfaf2ImIyIkYgYGRjY55ugixoaGlpSu5nZgaSeHvjpwHmSNgFfA86QtAZ4DLgxW+Ym4MRGFzcxMUF/f/9ebf39/UxMTDR6U2ZmyakZ4BFxeUQMRkQOOB+4PSLeCXwTeH222OuARxpd3OjoKJOTkwwPDyOJ4eFhJicn/QGmmRnLu5jVPwF5SX8DbAUuakxJexsdHXVgm5lVsKQAj4g7gTuz6S0Uz0wxM7M28DcxzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwSVXeAS+qWdK+kW8raPyNpa+NLMzOzxSylB34psKG0QdIIcFhDKzIzs7rUFeCSBin+/uV1JW3dwNXAZc0pzczMFlNvD/waikE9X9J2CXBzRDzR6KLMzKy2mgEu6VxgOiLWlbT9HvB24LN1rD8mqSCpMDMzs6xizcxsj546ljkdOE/S2cAKYCXwILAD2CgJoF/Sxog4vnzliJgEJgFGRkaiUYWbmR3oavbAI+LyiBiMiBxwPnB7RBwWES+KiFzWPlspvM3MrHl8HriZWaLqGULZLSLuBO6s0P78BtVjZmZ1cg/czCxRDnAzs0Q5wM3MEuUANzNLlAPczCxRDnAzs0Q5wM3MEuUANzNLlAPczCxRDnAzs0Q5wM3MEuUANzNLlAPczCxRDnAzs0Q5wM3MEuUANzNLlAPczCxRDnAzs0TVHeCSuiXdK+mW7HFe0sOSHpB0vaTe5pVpZmblltIDvxTYUPI4D5wAvBLoAy5qYF1mZlZDXQEuaRA4B7huoS0ivh0ZYC0w2JwSzcysknp74NcAlwHz5TOyoZMLge9UWlHSmKSCpMLMzMz+1mlmZmVqBrikc4HpiFhXZZHPAd+PiB9UmhkRkxExEhEjAwMDyyjVzMxK9dSxzOnAeZLOBlYAKyWtiYh3Svo4MAD8VTOLNDOzfdXsgUfE5RExGBE54Hzg9iy8LwLeBFwQEfsMrZiZWXMt5zzwLwBHAj+UtF7SFQ2qyczM6lDPEMpuEXEncGc2vaR1zcyssfxNTDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRDnAzcwS5QA3M0uUA9zMLFEOcDOzRNUd4JK6Jd0r6Zbs8bGS7pa0UdLXJR3UvDLNzKzcUnrglwIbSh5fBfxrRBwP/BZ4XyMLMzOzxdUV4JIGgXOA67LHAs4A/jtbZDXwtibUZ2ZmVdTbA78GuAyYzx6/ENgSEXPZ48eAoyutKGlMUkFSYWZmZjm1mplZiZoBLulcYDoi1u3PBiJiMiJGImJkYGBgf/6EmZlV0FPHMqcD50k6G1gBrAT+DVglqSfrhQ8CjzevTDMzK1ezBx4Rl0fEYETkgPOB2yNiFLgD+PNssXcB32palWZmto/lnAf+98BHJW2kOCb+xcaUZGZm9ahnCGW3iLgTuDOb/hlwauNLMjOzevibmGZmiXKAm5klygFuZpYoB7iZWaIc4GZmiXKAm5klygFuZpYoB7iZWaIc4GZmiXKAm5klygFuZpYoB7iZWaIc4GZmiXKAm5klygFuZpYoB7iZWaIc4GZmiXKAm5klqmaAS1ohaa2k+yQ9KOmTWfsbJN0jab2kuyQd3/xyzcxsQT098B3AGRHxKuAk4CxJpwGfB0Yj4iTgK8A/NKtIMzPbV80Aj6Kt2cPe7BbZbWXWfijwy6ZUaB0tn4dcDrq6ivf5fLsrsue6Vh1zjdhO02uNiJo3oBtYD2wFrsraXgM8BTwG/ARYWWXdMaAAFIaGhiJla9ZEDA9HSMX7NWvaXVF7rVkT0d8fAXtu/f3eL/Xy8VTdrvldsWNuRzz7u2fj6e1Px29mfxPTW6fjs1/6Zaw44hfBqp8HKzcH/dPRd+gzsfrLOxu6/UYc2418fQCFqJCvKs6rj6RVwE3AXwNXZmF+t6S/A14WERcttv7IyEgUCoX6/3fpIPk8jI3B7Oyetv5+mJyE0dH21dUoEcH2ue1sm9vGtp3bFr2f3TnLtp3buOJT29iydRv0boPuHbv/1iGHiIuyI0GoeC/tnr8/beXHaRCLtgV75nVi26M/De66C3bNBSiga47u3jn+8NQ5jh6cY26+eNs5v3P3dPlt567K88rXmY95bBnWfgi2vRAIVq4KLrk4dj+nEbH7uCs//iavDZ55ZmGwArjrY/DskQwPw6ZNSytB0rqIGNmnfSkBnv2hK4BtwAci4risbQj4TkS8fLF1Uw7wXA6mpoCzL4ZTP9fucsyslUK777u7haS9OhhCe90DzD6rbL3s/tq18NTLkGB+if+nVgvwnjpWHAB2RsQWSX3AmcBVwKGSXhoRj2RtG5ZWUn3yeRgfh82bYWgIJiba0+Od2jxf7Gke8UDrN24tt/Di7FLXPi/M8hdrpXuo/MJeuJ9+UntCYb6n5NbLK/+gh56ufW+93b0V23u6eujR4vN7u/ad19vdS7e695pe+PeW74tOabvwwmzfwZ79BxxxpPjqV/Y8bwv7v/S5XKxt1/wuduzawfa57Wyf286HLt3GU1u2Q88OuPc98LtDAJbUe97d6SszNFTf+vWoGeDAUcBqSd0UP/S8ISJukfR+4BuS5oHfAu9tXFlF5cMWU1PFxwDvuGCu5lv9bTuzt/uLLVNhaKDScnz8d/v97+jr6aOvt6/qfX9vf3G6xnJ9PdmyiyyzomcFXWrN6f3P9WGlZqr24h4ehvv/veXlJOPTF1c+5j79KTjj2MZtZ+d7Km9nYqL+vzExsfy/UcuSh1CWY6lDKLsP8re8H178vWIPuGdhzHXnftUgtHdo1gjMhemNG/q5+cY+ds72wc4+mOvjoK4+Lv1QH+e8sUog9/ZxcPfB+/Qinks65R1Savyf3/5r1THXiO00qtaGjYEvx1IDvKur+Nktf3R1cehibk94XvmPS++d9vX0cVD3QfsdqA4rayQfT1avJAN8sbeZS/0U18wsVdUCvKOvhTIxUXxbWarRY0hmZqnq6AAfHS2OCQ4Pg1S89xihmVlRPWehtNXoqAPbzKySju6Bm5lZdQ5wM7NEOcDNzBLlADczS5QD3MwsUQ5wM7NEOcDNzBLlADczS5QD3MwsUQ5wM7NEOcDNzBLlADczS1TNAJe0QtJaSfdJelDSJ7N2SZqQ9IikDZI+3Pxy2yufz5PL5ejq6iKXy5HP59tdkpkdwOq5GuEO4IyI2CqpF7hL0q3A7wPHACdExLykI5pZaLvl83nGxsaYzX4Da2pqirHsBzpHfblEM2uDmj3wKNqaPezNbgF8ELgyIuaz5aabVmUHGB8f3x3eC2ZnZxkfH29TRWZ2oKtrDFxSt6T1wDRwW0TcDRwHvENSQdKtkl5SZd2xbJnCzMxMwwpvtc2bNy+p3cys2eoK8IjYFREnAYPAqZJeARwMbM9+p+1a4Poq605GxEhEjAwMDDSo7NYbGhpaUruZWbMt6SyUiNgC3AGcBTwG3JjNugk4saGVdZiJiQn6y36gs7+/nwn/QKeZtUk9Z6EMSFqVTfcBZwIPAd8EXp8t9jrgkeaU2BlGR0eZnJxkeHgYSQwPDzM5OekPMM2sbRQRiy8gnQisBropBv4NEXFlFup5YAjYCnwgIu5b7G+NjIxEoVBoRN1mZgcMSeuy4eq91DyNMCLuB06u0L4FOKch1ZmZ2ZL5m5hmZolygJuZJcoBbmaWKAe4mVmiHOBmbeKLo9ly1XMxKzNrMF8czRrBPXCzNvDF0awRHOBmbeCLo1kjOMDN2sAXR7NGcICbtYEvjmaN4AA3awNfHM0aoebFrBrJF7MyM1u6ahezcg/czCxRDnAzs0Q5wM3MEuUANzNLlAPczCxR9fwm5gpJayXdJ+lBSZ8sm/8ZSVubV6KZWeulcLGxei5mtQM4IyK2SuoF7pJ0a0T8SNIIcFhzSzQza61ULjZWswceRQs97N7sFpK6gauBy5pYn5lZy6VysbG6xsAldUtaD0wDt0XE3cAlwM0R8USNdcckFSQVZmZmll2wmVmzpXKxsboCPCJ2RcRJwCBwqqTXAm8HPlvHupMRMRIRIwMDA8sq1sysFVK52NiSzkKJiC3AHcDrgeOBjZI2Af2SNja8OjOzNkjlYmP1nIUyIGlVNt0HnAmsi4gXRUQuInLAbEQc39RKzcxaJJWLjdW8mJWkE4HVQDfFwL8hIq4sW2ZrRDy/1sZ8MSszs6WrdjGrmqcRRsT9wMk1lqkZ3mZm1lj+JqaZWaIc4GZmiXKAm5klygFuZpYoB7iZWaIc4LYsKVyxzaxdmv36qOdqhGYVpXLFNrN2aMXrw79Kb/stl8sxNTW1T/vw8DCbNm1qfUFmHaSRrw//Kr01XCpXbDNrh1a8Phzgtt9SuWKbWTu04vXhALf9lsoV28zaoRWvDwe47bdUrthm1g6teH34Q0wzsw7nDzHNzJ5jHOBmZolygJuZJcoBbmaWKAe4mVmi6vlR4xWS1kq6T9KDkj6ZteclPSzpAUnXS+ptfrlmZragnh74DuCMiHgVcBJwlqTTgDxwAvBKoA+4qFlFmpnZvur5UeMAtmYPe7NbRMS3F5aRtBYYbEqFZmZWUV1j4JK6Ja0HpoHbIuLuknm9wIXAd6qsOyapIKkwMzPTgJLNzAzqDPCI2BURJ1HsZZ8q6RUlsz8HfD8iflBl3cmIGImIkYGBgWUXbGZmRUs6CyUitgB3AGcBSPo4MAB8tOGVmZnZouo5C2VA0qpsug84E3hI0kXAm4ALImK+qVWamdk+6vlJtaOA1ZK6KQb+DRFxi6Q5YAr4oSSAGyPiyuaVamZmpeo5C+V+4OQK7f49TTOzNvI3Mc3MEuUANzNLlAPczCxRDnAzs0Q5wM0sKfl8nlwuR1dXF7lcjnw+3+6S2sZnkphZMvL5PGNjY8zOzgIwNTXF2NgYwAH5Y9rugZtZMsbHx3eH94LZ2VnGx8fbVFF7OcDNLBmbN29eUvtznQPczJIxNDS0pPbnOge4mSVjYmKC/v7+vdr6+/uZmJhoU0Xt5QA3s2SMjo4yOTnJ8PAwkhgeHmZycvKA/AATQMUf3GmNkZGRKBQKLduemdlzgaR1ETFS3u4euJlZohzgZmaJcoCbmSXKAW5mligHuJlZolp6FoqkGYo/w7Y/Dgd+3cBymi2lel1r86RUb0q1Qlr1LrfW4YgYKG9saYAvh6RCpdNoOlVK9brW5kmp3pRqhbTqbVatHkIxM0uUA9zMLFEpBfhkuwtYopTqda3Nk1K9KdUKadXblFqTGQM3M7O9pdQDNzOzEg5wM7NEdWyAS9ok6ceS1ksqZG0vkHSbpEez+8M6oM6XZTUu3J6W9BFJn5D0eEn72W2s8XpJ05IeKGmruC9V9BlJGyXdL+mUDqj1akkPZfXcJGlV1p6TtK1kH3+hlbUuUm/V517S5dm+fVjSmzqg1q+X1LlJ0vqsva37VtIxku6Q9BNJD0q6NGvv1OO2Wr3NPXYjoiNvwCbg8LK2fwY+lk1/DLiq3XWW1dcN/AoYBj4B/G27a8rqei1wCvBArX0JnA3cCgg4Dbi7A2p9I9CTTV9VUmuudLkO2rcVn3vg5cB9wMHAscBPge521lo2/1+AKzph3wJHAadk04cAj2T7r1OP22r1NvXY7dgeeBVvBVZn06uBt7WvlIreAPw0Ivb326ZNERHfB35T1lxtX74V+M8o+hGwStJRLSmUyrVGxHcjYi57+CNgsFX11FJl31bzVuBrEbEjIn4ObARObVpxZRarVZKAvwC+2qp6FhMRT0TEPdn0M8AG4Gg697itWG+zj91ODvAAvitpnaSxrO3IiHgim/4VcGR7SqvqfPZ+AVySvXW6vhOGe8pU25dHA78oWe6xrK1TvJdiT2vBsZLulfS/kl7TrqIqqPTcd/K+fQ3wZEQ8WtLWEftWUg44GbibBI7bsnpLNfzY7eQAf3VEnAK8GbhY0mtLZ0bxfUjHnAMp6SDgPOC/sqbPA8cBJwFPUHx72pE6bV9WI2kcmAPyWdMTwFBEnAx8FPiKpJXtqq9EMs99iQvYu/PREftW0vOBbwAfiYinS+d14nFbrd5mHbsdG+AR8Xh2Pw3cRPGt5pMLb4uy++n2VbiPNwP3RMSTABHxZETsioh54Fpa+Fa5TtX25ePAMSXLDWZtbSXp3cC5wGj2wiUbingqm15HcUz5pW0rMrPIc9+p+7YH+FPg6wttnbBvJfVSDMN8RNyYNXfscVul3qYeux0Z4JKeJ+mQhWmKHwQ8ANwMvCtb7F3At9pTYUV79WDKxt/+hGL9naTavrwZ+MvsU/3TgP8recvaFpLOAi4DzouI2ZL2AUnd2fSLgZcAP2tPlXss8tzfDJwv6WBJx1Ksd22r66vgj4GHIuKxhYZ279tsTP6LwIaI+HTJrI48bqvV2/Rjt5Wf1NZ7A15M8dP6+4AHgfGs/YXA/wCPAt8DXtDuWrO6ngc8BRxa0vZl4MfA/RQPrqPaWN9XKb5l20lxbPB91fYlxU/x/4Nij+DHwEgH1LqR4vjm+uz2hWzZP8uOj/XAPcBbOmTfVn3ugfFs3z4MvLndtWbtXwI+ULZsW/ct8GqKwyP3lzzvZ3fwcVut3qYeu/4qvZlZojpyCMXMzGpzgJuZJcoBbmaWKAe4mVmiHOBmZolygJuZJcoBbmaWqP8H49Mj1E8qhkEAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Split, this time specifying how much\n", + "X_train, X_test, y_train, y_test = model_selection.train_test_split(X_situps, y_waist, test_size=0.33)\n", + "\n", + "# Init, train and predict\n", + "model = linear_model.LinearRegression()\n", + "model.fit(X_train, y_train)\n", + "y_predict = model.predict(y_test)\n", + "\n", + "# Plot\n", + "plt.scatter(X_test, y_test, color='black')\n", + "plt.plot(X_test, y_predict, color='green')\n", + "plt.scatter(X_test, y_predict, color='blue')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ok, now we have 7. But the data is interesting. It's pretty much a straight line! I.e. it makes no difference how many situps you do, your waistline will always be 38 inches. Yay! 😄\n", + "\n", + "Also, it seems like the one outlier changed the average for everything?\n", + "\n", + "This looks very weird, I need to look at the raw data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[225.]\n", + " [ 70.]\n", + " [155.]\n", + " [200.]\n", + " [215.]\n", + " [110.]\n", + " [ 50.]]\n", + "[[37.80851123]\n", + " [37.74363001]\n", + " [37.76525709]\n", + " [37.85176538]\n", + " [37.78688416]\n", + " [37.80851123]\n", + " [37.52735928]]\n" + ] + } + ], + "source": [ + "print(X_test)\n", + "print(y_predict)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(I got curious, is there a way to combine them in NumPy? Looks like hstack might do it)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[225. , 37.80851123],\n", + " [ 70. , 37.74363001],\n", + " [155. , 37.76525709],\n", + " [200. , 37.85176538],\n", + " [215. , 37.78688416],\n", + " [110. , 37.80851123],\n", + " [ 50. , 37.52735928]])" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.hstack((X_test, y_predict))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "niiiice :)\n", + "\n", + "anyway. Looks like the model is always giving us something in the range 37.7-37.8. Why? I have no idea, the training data surely didn't look like that, did it?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[101., 38.],\n", + " [ 60., 37.],\n", + " [210., 37.],\n", + " [210., 33.],\n", + " [230., 32.],\n", + " [120., 34.],\n", + " [162., 36.],\n", + " [101., 36.],\n", + " [110., 37.],\n", + " [105., 35.],\n", + " [125., 34.],\n", + " [101., 38.],\n", + " [251., 33.]])" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.hstack((X_train, y_train))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "...so no, it didn't. So why does the model think the answer is always 37?\n", + "\n", + "Maybe, if there's no correlation, it just gives up and uses the average?\n", + "\n", + "Let's see if the _training_ data has what looks like a correlation to my eyeballs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAATEElEQVR4nO3de2yd913H8fcXx6SHy2ZGzWicdhkbeKCmi6dDqIBxCRR3A4YJ14rbGKNs0iQuwkDERFFVVIbFkIZgqNMugXVlpcvMNCimQCGrRDM5SxYPVm8dl21Ot5iLKRGHyPO+/HEeFyeLfY4bn8sv5/2Sjnr8e36n56Pn/Prpc57zHDsyE0lSeT6v1wEkSU+PBS5JhbLAJalQFrgkFcoCl6RC7ermk1177bW5b9++bj6lJBXv5MmT/5aZo5eOd7XA9+3bx/z8fDefUpKKFxH/erlxT6FIUqEscEkqlAUuSYWywCWpUBa4JBWq5VUoEXENcBzYXc1/IDPviIhvA2Zo/k/gPPDyzHy8k2EH1WtnF7jvxCdYy2Qogtu+7nrumtrf61iSeqydI/ALwKHMfCFwALg1Im4G3gj8SGYeAN4BvLZTIQfZa2cXePujH2et+q2Ra5m8/dGP89rZhR4nk9RrLQs8m85XPw5Xt6xuz6jGnwmc7UjCAXffiU9sa1zS4GjrizwRMQScBJ4P/F5mnoiIVwJ/HhEN4Eng5k0eeztwO8ANN9ywI6EHydomv699s3FJg6OtDzEzc606VbIXOBgRNwI/D7w0M/cCbwVev8lj78nMembWR0c/55ugamEoYlvjkgbHtq5CycwV4GHgJcALM/NEtemdwNfvbDQB3PZ1129rXNLgaFngETEaESPV/RpwC/Bh4JkR8VXVtPUx7bC7pvbzozff8NQR91AEP3rzDV6FIqmtc+DXAUer8+CfB9yfme+NiJ8G3hURnwX+E3hFB3MOtLum9lvYkj5HywLPzDPAxGXG3w28uxOhJEmt+U1MSSqUBS5JhbLAJalQFrgkFcoCl6RCWeCSVCgLXJIKZYFLUqEscEkqlAUuSYWywCWpUBa4JBXKApekQlngklQoC1ySCmWBS1KhLHBJKpQFLkmFssAlqVAWuCQVygKXpEJZ4JJUKAtckgplgUtSoXa1mhAR1wDHgd3V/Acy846IeB/wxdW0LwPen5lTOx1w9tQSM3OLnF1psGekxvTkOFMTYzv9NFct9596xbXXeS0LHLgAHMrM8xExDDwSEQ9m5ovXJ0TEu4A/3elws6eWOHJsgcbqGgBLKw2OHFsAcCG0wf2nXnHtdUfLUyjZdL76cbi65fr2iHgGcAiY3elwM3OLTy2AdY3VNWbmFnf6qa5K7j/1imuvO9o6Bx4RQxFxGjgHPJSZJzZsngL+OjOf3OSxt0fEfETMLy8vbyvc2ZXGtsZ1MfefesW11x1tFXhmrmXmAWAvcDAibtyw+Tbgvi0ee09m1jOzPjo6uq1we0Zq2xrXxdx/6hXXXnds6yqUzFwBHgZuBYiIa4GDwJ/teDJgenKc2vDQRWO14SGmJ8c78XRXHfefesW11x3tXIUyCqxm5kpE1IBbgNdVm78feG9m/m8nwq1/2OEn2U+P+0+94trrjsjMrSdE3AQcBYZoHrHfn5l3Vtv+FvjNzPyLdp6sXq/n/Pz8FQWWpEETESczs37peMsj8Mw8A0xssu1brjyaJOnp8JuYklQoC1ySCmWBS1KhLHBJKpQFLkmFssAlqVAWuCQVygKXpEJZ4JJUKAtckgplgUtSoSxwSSqUBS5JhbLAJalQFrgkFcoCl6RCWeCSVCgLXJIKZYFLUqEscEkqlAUuSYWywCWpUBa4JBXKApekQu1qNSEirgGOA7ur+Q9k5h0REcBdwA8Aa8AbM/MNnQw7qGZPLTEzt8jZlQZ7RmpMT44zNTHW61gDx9dB/aZlgQMXgEOZeT4ihoFHIuJB4KuB64EXZOZnI+LLOhl0UM2eWuLIsQUaq2sALK00OHJsAcDy6CJfB/WjlqdQsul89eNwdUvg1cCdmfnZat65jqUcYDNzi0+VxrrG6hozc4s9SjSYfB3Uj9o6Bx4RQxFxGjgHPJSZJ4DnAT8UEfMR8WBEfOUmj729mjO/vLy8Y8EHxdmVxrbG1Rm+DupHbRV4Zq5l5gFgL3AwIm6keU78fzOzDrwJeMsmj70nM+uZWR8dHd2h2INjz0htW+PqDF8H9aNtXYWSmSvAw8CtwCeBY9WmdwM37WgyATA9OU5teOiisdrwENOT4z1KNJh8HdSPWhZ4RIxGxEh1vwbcAjwGzALfWk37ZuAjnYk42KYmxrj78H7GRmoEMDZS4+7D+/3grMt8HdSPIjO3nhBxE3AUGKJZ+Pdn5p1Vqd8L3ACcB16VmR/c6t9Vr9dzfn5+J3JL0sCIiJPV6eqLtLyMMDPPABOXGV8BvnNH0kmSts1vYkpSoSxwSSqUBS5JhbLAJalQFrgkFcoCl6RCWeCSVCgLXJIKZYFLUqEscEkqlAUuSYWywCWpUBa4JBXKApekQlngklQoC1ySCmWBS1KhLHBJKpQFLkmFssAlqVAWuCQVygKXpEJZ4JJUKAtckgq1q9WEiLgGOA7sruY/kJl3RMTbgG8G/qua+vLMPN2hnNrE7KklZuYWObvSYM9IjenJcaYmxnodS1IXtCxw4AJwKDPPR8Qw8EhEPFhtm87MBzoXT1uZPbXEkWMLNFbXAFhaaXDk2AKAJS4NgJanULLpfPXjcHXLjqZSW2bmFp8q73WN1TVm5hZ7lEhSN7V1DjwihiLiNHAOeCgzT1SbfiMizkTE70TE7k0ee3tEzEfE/PLy8s6kFgBnVxrbGpd0dWmrwDNzLTMPAHuBgxFxI3AEeAHwtcCzgF/e5LH3ZGY9M+ujo6M7k1oA7BmpbWtc0tVlW1ehZOYK8DBwa2Y+UZ1euQC8FTjYgXzawvTkOLXhoYvGasNDTE+O9yiRpG5qWeARMRoRI9X9GnAL8FhEXFeNBTAFfKhzMXU5UxNj3H14P2MjNQIYG6lx9+H9foApDYh2rkK5DjgaEUM0C//+zHxvRPxNRIwCAZwGXtW5mNrM1MSYhS0NqJYFnplngInLjB/qSCJJUlv8JqYkFcoCl6RCWeCSVCgLXJIKZYFLUqEscEkqlAUuSYWywCWpUBa4JBXKApekQlngklQoC1ySCmWBS1KhLHBJKpQFLkmFssAlqVAWuCQVygKXpEJZ4JJUKAtckgplgUtSoSxwSSqUBS5JhbLAJalQu1pNiIhrgOPA7mr+A5l5x4btbwBekZlf1LGU6pnZU0vMzC1ydqXBnpEa05PjTE2Mtb1dUue0LHDgAnAoM89HxDDwSEQ8mJmPRkQd+JLORlSvzJ5a4sixBRqrawAsrTQ4cmwBgKmJsZbbJXVWy1Mo2XS++nG4umVEDAEzwC91MJ96aGZu8alyXtdYXWNmbrGt7ZI6q61z4BExFBGngXPAQ5l5AngN8J7MfKLFY2+PiPmImF9eXr7iwOqesyuNLcdbbZfUWW0VeGauZeYBYC9wMCK+CfgB4HfbeOw9mVnPzPro6OgVhVV37RmpbTnearukztrWVSiZuQI8DHwr8Hzg8Yj4F+ALIuLxHU+nnpqeHKc2PHTRWG14iOnJ8ba2S+qsdq5CGQVWM3MlImrALcDrMvPLN8w5n5nP72BO9cD6B5GbXWXSarukzorM3HpCxE3AUWCI5hH7/Zl55yVzzrdzGWG9Xs/5+fkriCtJgyciTmZm/dLxlkfgmXkGmGgxx2vAJanL/CamJBXKApekQlngklQoC1ySCmWBS1KhLHBJKpQFLkmFssAlqVAWuCQVygKXpEJZ4JJUKAtckgplgUtSoSxwSSqUBS5JhbLAJalQFrgkFcoCl6RCWeCSVCgLXJIKZYFLUqEscEkqlAUuSYWywCWpULtaTYiIa4DjwO5q/gOZeUdEvBmoAwF8BHh5Zp7vZFhJ5Zg9tcTM3CJnVxrsGakxPTnO1MRYr2N1Vaf3QcsCBy4AhzLzfEQMA49ExIPAz2fmkwAR8XrgNcBv7lgyScWaPbXEkWMLNFbXAFhaaXDk2ALAwJR4N/ZBy1Mo2bR+ZD1c3XJDeQdQA3JHEkkq3szc4lPFta6xusbM3GKPEnVfN/ZBW+fAI2IoIk4D54CHMvNENf5W4FPAC4Df3eSxt0fEfETMLy8v70xqSX3t7EpjW+NXo27sg7YKPDPXMvMAsBc4GBE3VuM/CewBPgz80CaPvScz65lZHx0d3ZnUkvranpHatsavRt3YB9u6CiUzV4CHgVs3jK0Bfwx8346lklS06clxasNDF43VhoeYnhzvUaLu68Y+aFngETEaESPV/RpwC7AYEc+vxgJ4GfDYjqWSVLSpiTHuPryfsZEaAYyN1Lj78P6B+QATurMPInPrzx4j4ibgKDBEs/DvB+4C3gc8g+ZlhB8EXr3+weZm6vV6zs/P70BsSRocEXEyM+uXjre8jDAzzwATl9n0DTsRTJL09PhNTEkqlAUuSYWywCWpUBa4JBXKApekQlngklQoC1ySCmWBS1KhLHBJKpQFLkmFssAlqVAWuCQVygKXpEJZ4JJUKAtckgplgUtSoSxwSSqUBS5JhbLAJalQFrgkFcoCl6RCWeCSVCgLXJIKZYFLUqF2tZoQEdcAx4Hd1fwHMvOOiLgXqAOrwPuBn8nM1U6GlaR2zZ5aYmZukbMrDfaM1JieHGdqYqzXsXZUO0fgF4BDmflC4ABwa0TcDNwLvADYD9SAV3YqpCRtx+ypJY4cW2BppUECSysNjhxbYPbUUq+j7aiWBZ5N56sfh6tbZuafV9uS5hH43g7mlKS2zcwt0lhdu2issbrGzNxijxJ1RlvnwCNiKCJOA+eAhzLzxIZtw8CPAX+xyWNvj4j5iJhfXl7egciStLWzK41tjZeqrQLPzLXMPEDzKPtgRNy4YfPvA8cz832bPPaezKxnZn10dPSKA0tSK3tGatsaL9W2rkLJzBXgYeBWgIi4AxgFfmHHk0nS0zQ9OU5teOiisdrwENOT4z1K1BktCzwiRiNipLpfA24BHouIVwKTwG2Z+dmOppSkbZiaGOPuw/sZG6kRwNhIjbsP77/qrkJpeRkhcB1wNCKGaBb+/Zn53oj4DPCvwN9HBMCxzLyzc1ElqX1TE2NXXWFfqmWBZ+YZYOIy4+2UvySpQ/wmpiQVygKXpEJZ4JJUKAtckgoVzW/Cd+nJIpZpXrnSj64F/q3XIbZgvqevn7OB+a7UIOR7TmZ+zjchu1rg/Swi5jOz3uscmzHf09fP2cB8V2qQ83kKRZIKZYFLUqEs8P93T68DtGC+p6+fs4H5rtTA5vMcuCQVyiNwSSqUBS5JhRq4Ao+I8Yg4veH2ZET8XET8ekQsbRh/aRczvSUizkXEhzaMPSsiHoqIj1b//JJqPCLiDRHxeESciYgX9SjfTEQ8VmV494ZfObwvIhob9uMf9Cjfpq9nRByp9t9iREz2KN87N2T7l+ovXnV9/0XE9RHxcET8Y0T8Q0T8bDXeF+tvi3x9sf62yNed9ZeZA3sDhoBPAc8Bfh34xR7l+CbgRcCHNoz9FvAr1f1fAV5X3X8p8CAQwM3AiR7l+w5gV3X/dRvy7ds4r4f777KvJ/A1wAeB3cBzgY8BQ93Od8n23wZ+rRf7j+avi35Rdf+LgY9U+6gv1t8W+fpi/W2Rryvrb+COwC/xbcDHMrOn3w7NzOPAf1wy/D3A0er+UWBqw/gfZtOjwEhEXNftfJn5l5n5merHR+nhH7XeZP9t5nuAP87MC5n5z8DjwMGOhWPrfNH8Zfo/CNzXyQybycwnMvMD1f3/Bj4MjNEn62+zfP2y/rbYf5vZ0fU36AX+w1z8H85rqrdkb1l/y9hDz87MJ6r7nwKeXd0fAz6xYd4n2XrBdMMraB6VrXtuRJyKiL+LiBf3KhSXfz37bf+9GPh0Zn50w1hP9l9E7KP5u/9P0Ifr75J8G/XF+rtMvo6vv4Et8Ij4fOBlwJ9UQ28EngccAJ6g+ba2L2TzvVdfXu8ZEb8KfAa4txp6ArghMydo/q3Ud0TEM3oQrW9fz0vcxsUHET3ZfxHxRcC7gJ/LzCc3buuH9bdZvn5Zf5fJ15X1N7AFDrwE+EBmfhogMz+dmWvZ/Pueb6LDb6vb8On1t6bVP89V40vA9Rvm7a3Gui4iXg58F/Aj1X/kVG8N/726f5LmOb6v6na2LV7Pftp/u4DDwDvXx3qx/yJimGb53JuZx6rhvll/m+Trm/V3uXzdWn+DXOAXHflcch7ve4EPfc4juus9wE9U938C+NMN4z9eXQ1wM/BfG97qdk1E3Ar8EvCyzPyfDeOj0fz7qUTEVwBfCfxTD/Jt9nq+B/jhiNgdEc+t8r2/2/kq3w48lpmfXB/o9v6rzsG/GfhwZr5+w6a+WH+b5euX9bdFvu6sv259WttPN+ALgX8Hnrlh7I+ABeBMtZOv62Ke+2i+zVqleU7sp4AvBf4a+CjwV8CzqrkB/B7NI4sFoN6jfI/TPJd3urr9QTX3+4B/qMY+AHx3j/Jt+noCv1rtv0XgJb3IV42/DXjVJXO7uv+Ab6R5euTMhtfypf2y/rbI1xfrb4t8XVl/fpVekgo1yKdQJKloFrgkFcoCl6RCWeCSVCgLXJIKZYFLUqEscEkq1P8Bn4/ky4ckqeAAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.scatter(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hard to say, it _looks_ like there is a trend - waistline inversely proportional to situps, which is what I'd expect (sadly). If I as a humble human was drawing a line it wouldn't be straight across.\n", + "\n", + "Fun idea, let's ask Excel!\n", + "![Excel graph](attachment:Picture1.png)\n", + "\n", + "Excel agrees with me. Not sure why the model doesn't. Maybe we should try again from scratch?" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[105. 40.90121617]\n", + " [215. 40.94564911]\n", + " [110. 40.99008206]\n", + " [210. 40.99008206]\n", + " [ 60. 40.81235027]\n", + " [162. 40.85678322]\n", + " [ 70. 40.85678322]]\n" + ] + } + ], + "source": [ + "X, y = datasets.load_linnerud(return_X_y=True)\n", + "\n", + "X_situps = X[:, np.newaxis, 1]\n", + "y_waist = y[:, np.newaxis, 1]\n", + "\n", + "X_train, X_test, y_train, y_test = model_selection.train_test_split(X_situps, y_waist, test_size=0.33)\n", + "\n", + "model = linear_model.LinearRegression()\n", + "model.fit(X_train, y_train)\n", + "y_predict = model.predict(y_test)\n", + "\n", + "print(np.hstack((X_test, y_predict)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Bizarre. Same again, but a different number this time.\n", + "\n", + "I mean, a different number is not bizarre because it splits train and test randomly. But why does it always think everyone is the same?\n", + "\n", + "I really really hope it's not because of some stupid mistake I made..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "About to give up on this, but just for fun let's see if it gets my waistline correct:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[42.45636929]])" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.predict([[0]])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "...no comment 😉" + ] + } + ], + "metadata": { + "interpreter": { + "hash": "c7d6cb708d9496164cad24676295f59deddd15f42781117113af2b6c8d53f583" + }, + "kernelspec": { + "display_name": "Python 3.9.7 64-bit (windows store)", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.7" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +}