You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
116 lines
3.1 KiB
116 lines
3.1 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Pumpkin Pricing\n",
|
|
"\n",
|
|
"Load up required libraries and dataset. Convert the data to a dataframe containing a subset of the data: \n",
|
|
"\n",
|
|
"- Only get pumpkins priced by the bushel\n",
|
|
"- Convert the date to a month\n",
|
|
"- Calculate the price to be an average of high and low prices\n",
|
|
"- Convert the price to reflect the pricing by bushel quantity"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import numpy as np\n",
|
|
"from datetime import datetime\n",
|
|
"\n",
|
|
"pumpkins = pd.read_csv('../data/US-pumpkins.csv')\n",
|
|
"\n",
|
|
"pumpkins.head()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
|
|
"\n",
|
|
"columns_to_select = ['Package', 'Variety', 'City Name', 'Low Price', 'High Price', 'Date']\n",
|
|
"pumpkins = pumpkins.loc[:, columns_to_select]\n",
|
|
"\n",
|
|
"price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
|
|
"\n",
|
|
"month = pd.DatetimeIndex(pumpkins['Date']).month\n",
|
|
"day_of_year = pd.to_datetime(pumpkins['Date']).apply(lambda dt: (dt-datetime(dt.year,1,1)).days)\n",
|
|
"\n",
|
|
"new_pumpkins = pd.DataFrame(\n",
|
|
" {'Month': month, \n",
|
|
" 'DayOfYear' : day_of_year, \n",
|
|
" 'Variety': pumpkins['Variety'], \n",
|
|
" 'City': pumpkins['City Name'], \n",
|
|
" 'Package': pumpkins['Package'], \n",
|
|
" 'Low Price': pumpkins['Low Price'],\n",
|
|
" 'High Price': pumpkins['High Price'], \n",
|
|
" 'Price': price})\n",
|
|
"\n",
|
|
"new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/1.1\n",
|
|
"new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price*2\n",
|
|
"\n",
|
|
"new_pumpkins.head()\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"A basic scatterplot reminds us that we only have month data from August through December. We probably need more data to be able to draw conclusions in a linear fashion."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import matplotlib.pyplot as plt\n",
|
|
"plt.scatter('Month','Price',data=new_pumpkins)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"\n",
|
|
"plt.scatter('DayOfYear','Price',data=new_pumpkins)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.3-final"
|
|
},
|
|
"orig_nbformat": 2
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|