diff --git a/3-Data-Visualization/13-visualization-relationships/README.md b/3-Data-Visualization/13-visualization-relationships/README.md index f9fadc9..8a57b4a 100644 --- a/3-Data-Visualization/13-visualization-relationships/README.md +++ b/3-Data-Visualization/13-visualization-relationships/README.md @@ -10,6 +10,88 @@ It will be interesting to visualize the relationship between a given state's pro [Pre-lecture quiz]() +In this lesson, you can use Seaborn, which you use before, as a good library to visualize relationships between variables. Particularly interesting is the use of Seaborn's `relplot` function that allows scatter plots and line plots to quickly visualize '[statistical relationships](https://seaborn.pydata.org/tutorial/relational.html?highlight=relationships)', which allow the data scientist to better understand how variables relate to each other. +## Scatterplots + +Use a scatterplot to show how the price of honey has evolved, year over year, per state. Seaborn, using `relplot`, conveniently groups the state data and displays data points for both categorical and numeric data. + +Let's start by importing the data and Seaborn: + +```python +import pandas as pd +import matplotlib.pyplot as plt +import seaborn as sns +honey = pd.read_csv('../../data/honey.csv') +honey.head() +``` +You notice that the honey data has several interesting columns, including year and price per pound. Let's explore this data, grouped by U.S. state: + +| state | numcol | yieldpercol | totalprod | stocks | priceperlb | prodvalue | year | +| ----- | ------ | ----------- | --------- | -------- | ---------- | --------- | ---- | +| AL | 16000 | 71 | 1136000 | 159000 | 0.72 | 818000 | 1998 | +| AZ | 55000 | 60 | 3300000 | 1485000 | 0.64 | 2112000 | 1998 | +| AR | 53000 | 65 | 3445000 | 1688000 | 0.59 | 2033000 | 1998 | +| CA | 450000 | 83 | 37350000 | 12326000 | 0.62 | 23157000 | 1998 | +| CO | 27000 | 72 | 1944000 | 1594000 | 0.7 | 1361000 | 1998 | + + +Create a basic scatterplot to show the relationship between the price per pound of honey and its U.S. state of origin. Make the `y` axis tall enough to display all the states: + +```python +sns.relplot(x="priceperlb", y="state", data=honey, height=15, aspect=.5); +``` +![scatterplot 1](images/scatter1.png) + +Now, show the same data with a honey color scheme to show how the price evolves over the years. You can do this by adding a 'hue' parameter to show the change, year over year: + +> ✅ Learn more about the [color palettes you can use in Seaborn](https://seaborn.pydata.org/tutorial/color_palettes.html) - try a beautiful rainbow color scheme! + +```python +sns.relplot(x="priceperlb", y="state", hue="year", palette="YlOrBr", data=honey, height=15, aspect=.5); +``` +![scatterplot 2](images/scatter2.png) + +With this color scheme change, you can see that there's obviously a strong progression over the years in terms of honey price per pound. Indeed, if you look at a sample set in the data to verify (pick a given state, Arizona for example) you can see a pattern of price increases year over year, with few exceptions: + +| state | numcol | yieldpercol | totalprod | stocks | priceperlb | prodvalue | year | +| ----- | ------ | ----------- | --------- | ------- | ---------- | --------- | ---- | +| AZ | 55000 | 60 | 3300000 | 1485000 | 0.64 | 2112000 | 1998 | +| AZ | 52000 | 62 | 3224000 | 1548000 | 0.62 | 1999000 | 1999 | +| AZ | 40000 | 59 | 2360000 | 1322000 | 0.73 | 1723000 | 2000 | +| AZ | 43000 | 59 | 2537000 | 1142000 | 0.72 | 1827000 | 2001 | +| AZ | 38000 | 63 | 2394000 | 1197000 | 1.08 | 2586000 | 2002 | +| AZ | 35000 | 72 | 2520000 | 983000 | 1.34 | 3377000 | 2003 | +| AZ | 32000 | 55 | 1760000 | 774000 | 1.11 | 1954000 | 2004 | +| AZ | 36000 | 50 | 1800000 | 720000 | 1.04 | 1872000 | 2005 | +| AZ | 30000 | 65 | 1950000 | 839000 | 0.91 | 1775000 | 2006 | +| AZ | 30000 | 64 | 1920000 | 902000 | 1.26 | 2419000 | 2007 | +| AZ | 25000 | 64 | 1600000 | 336000 | 1.26 | 2016000 | 2008 | +| AZ | 20000 | 52 | 1040000 | 562000 | 1.45 | 1508000 | 2009 | +| AZ | 24000 | 77 | 1848000 | 665000 | 1.52 | 2809000 | 2010 | +| AZ | 23000 | 53 | 1219000 | 427000 | 1.55 | 1889000 | 2011 | +| AZ | 22000 | 46 | 1012000 | 253000 | 1.79 | 1811000 | 2012 | + + +Another way to visualize this progression is to use size, rather than color. For colorblind users, this might be a better option. Edit your visualization to show an increase of price by an increase in dot circumference: + +```python +sns.relplot(x="priceperlb", y="state", size="year", data=honey, height=15, aspect=.5); +``` +You can see the size of the dots gradually increasing. + +![scatterplot 3](images/scatter3.png) + +Is this a simple case of supply and demand? Is there less honey available for purchase year over year, and thus the price increases? + +To discover a correlation between price, number of colonies, and yield per colony, let's explore some line charts. + + + + +## Multi-line Plots + + + ## 🚀 Challenge ## Post-Lecture Quiz diff --git a/3-Data-Visualization/13-visualization-relationships/images/scatter1.png b/3-Data-Visualization/13-visualization-relationships/images/scatter1.png new file mode 100644 index 0000000..b558184 Binary files /dev/null and b/3-Data-Visualization/13-visualization-relationships/images/scatter1.png differ diff --git a/3-Data-Visualization/13-visualization-relationships/images/scatter2.png b/3-Data-Visualization/13-visualization-relationships/images/scatter2.png new file mode 100644 index 0000000..8a45eac Binary files /dev/null and b/3-Data-Visualization/13-visualization-relationships/images/scatter2.png differ diff --git a/3-Data-Visualization/13-visualization-relationships/images/scatter3.png b/3-Data-Visualization/13-visualization-relationships/images/scatter3.png new file mode 100644 index 0000000..1b186ce Binary files /dev/null and b/3-Data-Visualization/13-visualization-relationships/images/scatter3.png differ diff --git a/3-Data-Visualization/13-visualization-relationships/solution/notebook.ipynb b/3-Data-Visualization/13-visualization-relationships/solution/notebook.ipynb index ca53da0..06221d4 100644 --- a/3-Data-Visualization/13-visualization-relationships/solution/notebook.ipynb +++ b/3-Data-Visualization/13-visualization-relationships/solution/notebook.ipynb @@ -6,12 +6,237 @@ "# Visualizing Honey Production 🍯 🐝" ], "metadata": {} + }, + { + "cell_type": "code", + "execution_count": 29, + "source": [ + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "honey = pd.read_csv('../../../data/honey.csv')\n", + "honey.head()" + ], + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " state numcol yieldpercol totalprod stocks priceperlb \\\n", + "0 AL 16000.0 71 1136000.0 159000.0 0.72 \n", + "1 AZ 55000.0 60 3300000.0 1485000.0 0.64 \n", + "2 AR 53000.0 65 3445000.0 1688000.0 0.59 \n", + "3 CA 450000.0 83 37350000.0 12326000.0 0.62 \n", + "4 CO 27000.0 72 1944000.0 1594000.0 0.70 \n", + "\n", + " prodvalue year \n", + "0 818000.0 1998 \n", + "1 2112000.0 1998 \n", + "2 2033000.0 1998 \n", + "3 23157000.0 1998 \n", + "4 1361000.0 1998 " + ], + "text/html": [ + "
\n", + " | state | \n", + "numcol | \n", + "yieldpercol | \n", + "totalprod | \n", + "stocks | \n", + "priceperlb | \n", + "prodvalue | \n", + "year | \n", + "
---|---|---|---|---|---|---|---|---|
0 | \n", + "AL | \n", + "16000.0 | \n", + "71 | \n", + "1136000.0 | \n", + "159000.0 | \n", + "0.72 | \n", + "818000.0 | \n", + "1998 | \n", + "
1 | \n", + "AZ | \n", + "55000.0 | \n", + "60 | \n", + "3300000.0 | \n", + "1485000.0 | \n", + "0.64 | \n", + "2112000.0 | \n", + "1998 | \n", + "
2 | \n", + "AR | \n", + "53000.0 | \n", + "65 | \n", + "3445000.0 | \n", + "1688000.0 | \n", + "0.59 | \n", + "2033000.0 | \n", + "1998 | \n", + "
3 | \n", + "CA | \n", + "450000.0 | \n", + "83 | \n", + "37350000.0 | \n", + "12326000.0 | \n", + "0.62 | \n", + "23157000.0 | \n", + "1998 | \n", + "
4 | \n", + "CO | \n", + "27000.0 | \n", + "72 | \n", + "1944000.0 | \n", + "1594000.0 | \n", + "0.70 | \n", + "1361000.0 | \n", + "1998 | \n", + "