assignment and challenge

5 years ago · 78c62ed687
parent 658083d083
commit 78c62ed687
4 changed files with 14 additions and 1775 deletions
--- a/2-Regression/2-Data/README.md
+++ b/2-Regression/2-Data/README.md
@ -23,12 +23,12 @@ What if you are trying to correlate two points of data - like age to height? You
 But it's not very common to be gifted a dataset that is completely ready to use to create a ML model. In this lesson, you will learn how to prepare a raw dataset using standard Python libraries. You will also learn various techniques to visualize the data.
 ### Preparation

-In this folder you will find a .csv file called `US-pumpkins.csv` which includes 1757 lines of data about the pumpkin market, sorted into groupings by city. This is the raw data extracted from the [Specialty Crops Terminal Markets Standard Reports](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) distributed by the United States Department of Agriculture. 
+In this folder you will find a .csv file in the root `data` folder called [US-pumpkins.csv](../../data/US-pumpkins.csv) which includes 1757 lines of data about the pumpkin market, sorted into groupings by city. This is raw data extracted from the [Specialty Crops Terminal Markets Standard Reports](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) distributed by the United States Department of Agriculture.

 This data is in the public domain. It can be downloaded in many separate files, per city, from the USDA web site. To avoid too many separate files we have concatenated all the city data into one spreadsheet. Take a look at this file.
 ## The Pumpkin data

-What do you notice about this data? First, you see that it is a mix of text and numeric data. There are also dates. Second, you see that there's a considerable amount of missing and mixed data. To build a good model, you will need to handle that. 
+What do you notice about this data? First, you see that it is a mix of text and numeric data. There are also dates. Second, you see that there's a considerable amount of missing and mixed data. To build a good model, you will need to handle that.

 What question can you ask of this data, using a Regression technique? What about "Predict the price of a pumpkin for sale during a given month". Looking again at the data, there are some changes you need to make to create the data structure necessary for the task. 
 ### Analyze the Pumpkin Data
@ -39,7 +39,7 @@ Open the `notebook.ipynb` file in VS Code and import the spreadsheet in to a new

 ```python
 import pandas as pd
-pumpkins = pd.read_csv('US-pumpkins.csv')
+pumpkins = pd.read_csv('../../data/US-pumpkins.csv')
 pumpkins.head()
 ```

@ -108,7 +108,7 @@ One data visualization libary that works well in Jupyter notebooks is [Matplotli
 > Get more experience with data visualization in [these tutorials](https://docs.microsoft.com/learn/modules/explore-analyze-data-with-python?WT.mc_id=academic-15963-cxa).
 ## Experiment with Matplotlib

-Try to create some simple plots to display the new dataframe you just created. What would a basic line plot show?
+Try to create some basic plots to display the new dataframe you just created. What would a basic line plot show?

 Import Matplotlib at the top of the file, under the Pandas import:

@ -132,14 +132,12 @@ Add a cell to create a grouped bar chart:
 new_pumpkins.groupby(['Month'])['Price'].mean().plot(kind='bar')
 plt.ylabel("Pumpkin Price")
 ```
-This is a more useful data visualization! It seems to indicate that the highest price for pumpkins occurs in September and October. Does that meet your expectation? Why or why not?
-
-🚀 Challenge: Add a challenge for students to work on collaboratively in class to enhance the project

-Optional: add a screenshot of the completed lesson's UI if appropriate
+This is a more useful data visualization! It seems to indicate that the highest price for pumpkins occurs in September and October. Does that meet your expectation? Why or why not?

-## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/8/)
+🚀 Challenge: Explore the different types of visualization that matplotlib offers. Which types are most appropriate for regression problems?
+## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/8/)

 ## Review & Self Study

-**Assignment**: [Assignment Name](assignment.md)
+**Assignment**: [Exploring visualization](assignment.md)
--- a/2-Regression/2-Data/US-pumpkins.csv
+++ b/2-Regression/2-Data/US-pumpkins.csv
--- a/2-Regression/2-Data/assignment.md
+++ b/2-Regression/2-Data/assignment.md
@ -1,9 +1,8 @@
-# [Assignment Name]
-
-## Instructions
+# Exploring Visualizations

+There are several different libraries that are available for data visualization. Create some visualizations using the Pumpkin data in this lesson with matplotlib and seaborn in a sample notebook. Which libraries are easier to work with?
 ## Rubric

 | Criteria | Exemplary | Adequate | Needs Improvement |
 | -------- | --------- | -------- | ----------------- |
-|          |           |          |                   |
+|          | A notebook is submitted with two explorations/visualizations         |   A notebook is submitted with one explorations/visualizations       |  A notebook is not submitted                 |
--- a/2-Regression/2-Data/solution/notebook.ipynb
+++ b/2-Regression/2-Data/solution/notebook.ipynb
@ -24,7 +24,7 @@
 "cells": [
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
@ -64,13 +64,13 @@
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>City Name</th>\n      <th>Type</th>\n      <th>Package</th>\n      <th>Variety</th>\n      <th>Sub Variety</th>\n      <th>Grade</th>\n      <th>Date</th>\n      <th>Low Price</th>\n      <th>High Price</th>\n      <th>Mostly Low</th>\n      <th>...</th>\n      <th>Unit of Sale</th>\n      <th>Quality</th>\n      <th>Condition</th>\n      <th>Appearance</th>\n      <th>Storage</th>\n      <th>Crop</th>\n      <th>Repack</th>\n      <th>Trans Mode</th>\n      <th>Unnamed: 24</th>\n      <th>Unnamed: 25</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>70</th>\n      <td>BALTIMORE</td>\n      <td>NaN</td>\n      <td>1 1/9 bushel cartons</td>\n      <td>PIE TYPE</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>9/24/16</td>\n      <td>15.0</td>\n      <td>15.0</td>\n      <td>15.0</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>N</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>71</th>\n      <td>BALTIMORE</td>\n      <td>NaN</td>\n      <td>1 1/9 bushel cartons</td>\n      <td>PIE TYPE</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>9/24/16</td>\n      <td>18.0</td>\n      <td>18.0</td>\n      <td>18.0</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>N</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>72</th>\n      <td>BALTIMORE</td>\n      <td>NaN</td>\n      <td>1 1/9 bushel cartons</td>\n      <td>PIE TYPE</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>10/1/16</td>\n      <td>18.0</td>\n      <td>18.0</td>\n      <td>18.0</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>N</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>73</th>\n      <td>BALTIMORE</td>\n      <td>NaN</td>\n      <td>1 1/9 bushel cartons</td>\n      <td>PIE TYPE</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>10/1/16</td>\n      <td>17.0</td>\n      <td>17.0</td>\n      <td>17.0</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>N</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>74</th>\n      <td>BALTIMORE</td>\n      <td>NaN</td>\n      <td>1 1/9 bushel cartons</td>\n      <td>PIE TYPE</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>10/8/16</td>\n      <td>15.0</td>\n      <td>15.0</td>\n      <td>15.0</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>N</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 26 columns</p>\n</div>"
     },
     "metadata": {},
-     "execution_count": 8
+     "execution_count": 1
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
-    "pumpkins = pd.read_csv('../US-pumpkins.csv')\n",
+    "pumpkins = pd.read_csv('../../data/US-pumpkins.csv')\n",
    "\n",
    "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
    "\n",