{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pumpkin Pricing\n",
    "\n",
    "Load the necessary libraries and dataset. Transform the data into a dataframe containing a subset of the information:\n",
    "\n",
    "- Include only pumpkins priced by the bushel\n",
    "- Change the date format to show the month\n",
    "- Compute the price as the average of the high and low prices\n",
    "- Adjust the price to represent the cost per bushel quantity\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "\n",
    "pumpkins = pd.read_csv('../data/US-pumpkins.csv')\n",
    "\n",
    "pumpkins.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
    "\n",
    "columns_to_select = ['Package', 'Variety', 'City Name', 'Low Price', 'High Price', 'Date']\n",
    "pumpkins = pumpkins.loc[:, columns_to_select]\n",
    "\n",
    "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
    "\n",
    "month = pd.DatetimeIndex(pumpkins['Date']).month\n",
    "day_of_year = pd.to_datetime(pumpkins['Date']).apply(lambda dt: (dt-datetime(dt.year,1,1)).days)\n",
    "\n",
    "new_pumpkins = pd.DataFrame(\n",
    "    {'Month': month, \n",
    "     'DayOfYear' : day_of_year, \n",
    "     'Variety': pumpkins['Variety'], \n",
    "     'City': pumpkins['City Name'], \n",
    "     'Package': pumpkins['Package'], \n",
    "     'Low Price': pumpkins['Low Price'],\n",
    "     'High Price': pumpkins['High Price'], \n",
    "     'Price': price})\n",
    "\n",
    "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/1.1\n",
    "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price*2\n",
    "\n",
    "new_pumpkins.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A basic scatterplot reminds us that we only have month data from August through December. We probably need more data to be able to draw conclusions in a linear fashion.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.scatter('Month','Price',data=new_pumpkins)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "plt.scatter('DayOfYear','Price',data=new_pumpkins)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**Disclaimer**:  \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3-final"
  },
  "orig_nbformat": 2,
  "coopTranslator": {
   "original_hash": "b032d371c75279373507f003439a577e",
   "translation_date": "2025-09-06T15:27:57+00:00",
   "source_file": "2-Regression/3-Linear/notebook.ipynb",
   "language_code": "en"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}