{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pumpkin Pricing\n",
    "\n",
    "Load up required libraries and dataset. Convert the data to a dataframe containing a subset of the data: \n",
    "\n",
    "- Only get pumpkins priced by the bushel\n",
    "- Convert the date to a month\n",
    "- Calculate the price to be an average of high and low prices\n",
    "- Convert the price to reflect the pricing by bushel quantity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "\n",
    "pumpkins = pd.read_csv('../data/US-pumpkins.csv')\n",
    "\n",
    "pumpkins.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
    "\n",
    "columns_to_select = ['Package', 'Variety', 'City Name', 'Low Price', 'High Price', 'Date']\n",
    "pumpkins = pumpkins.loc[:, columns_to_select]\n",
    "\n",
    "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
    "\n",
    "month = pd.DatetimeIndex(pumpkins['Date']).month\n",
    "day_of_year = pd.to_datetime(pumpkins['Date']).apply(lambda dt: (dt-datetime(dt.year,1,1)).days)\n",
    "\n",
    "new_pumpkins = pd.DataFrame(\n",
    "    {'Month': month, \n",
    "     'DayOfYear' : day_of_year, \n",
    "     'Variety': pumpkins['Variety'], \n",
    "     'City': pumpkins['City Name'], \n",
    "     'Package': pumpkins['Package'], \n",
    "     'Low Price': pumpkins['Low Price'],\n",
    "     'High Price': pumpkins['High Price'], \n",
    "     'Price': price})\n",
    "\n",
    "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/1.1\n",
    "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price*2\n",
    "\n",
    "new_pumpkins.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A basic scatterplot reminds us that we only have month data from August through December. We probably need more data to be able to draw conclusions in a linear fashion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.scatter('Month','Price',data=new_pumpkins)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "plt.scatter('DayOfYear','Price',data=new_pumpkins)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3-final"
  },
  "orig_nbformat": 2
 },
 "nbformat": 4,
 "nbformat_minor": 2
}