@ -0,0 +1,204 @@
|
|||||||
|
# Visualizing Quantities
|
||||||
|
|
||||||
|
In this lesson, you will use three different libraries to learn how to create interesting visualizations all around the concept of quantity. Using a cleaned dataset about the birds of Minnesota, you can learn many interesting facts about local wildlife.
|
||||||
|
## Pre-Lecture Quiz
|
||||||
|
|
||||||
|
[Pre-lecture quiz]()
|
||||||
|
|
||||||
|
## Observe wingspan with Matplotlib
|
||||||
|
|
||||||
|
An excellent library to create both simple and sophisticated plots and charts of various kinds is [Matplotlib](https://matplotlib.org/stable/index.html). In general terms, the process of plotting data using these libraries includes identifying the parts of your dataframe that you want to target, performing any transforms on that data necessary, assigning its x and y axis values, deciding what kind of plot to show, and then showing the plot. Matplotlib offers a large variety of visualizations, but for this lesson, let's focus on the ones most appropriate for visualizing quantity: line charts, scatterplots, and bar plots.
|
||||||
|
|
||||||
|
> ✅ Use the best chart to suit your data's structure and the story you want to tell.
|
||||||
|
> - To analyze trends over time: line
|
||||||
|
> - To compare values: bar, column, pie, scatterplot
|
||||||
|
> - To show how parts relate to a whole: pie
|
||||||
|
> - To show distribution of data: scatterplot, bar
|
||||||
|
> - To show trends: line, column
|
||||||
|
> - To show relationships between values: line, scatterplot, bubble
|
||||||
|
|
||||||
|
If you have a dataset and need to discover how much of a given item is included, one of the first tasks you have at hand will be to inspect its values.
|
||||||
|
|
||||||
|
✅ There are very good 'cheat sheets' available for Matplotlib [here](https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets-1.png) and [here](https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets-2.png).
|
||||||
|
|
||||||
|
## Build a line plot about bird wingspan values
|
||||||
|
|
||||||
|
Open the `notebook.ipynb` file at the root of this lesson folder and add a cell.
|
||||||
|
|
||||||
|
> Note: the data is stored in the root of this repo in the `/data` folder.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
birds = pd.read_csv('../../data/birds.csv')
|
||||||
|
birds.head()
|
||||||
|
```
|
||||||
|
This data is a mix of text and numbers:
|
||||||
|
|
||||||
|
|
||||||
|
| | Name | ScientificName | Category | Order | Family | Genus | ConservationStatus | MinLength | MaxLength | MinBodyMass | MaxBodyMass | MinWingspan | MaxWingspan |
|
||||||
|
| ---: | :--------------------------- | :--------------------- | :-------------------- | :----------- | :------- | :---------- | :----------------- | --------: | --------: | ----------: | ----------: | ----------: | ----------: |
|
||||||
|
| 0 | Black-bellied whistling-duck | Dendrocygna autumnalis | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 47 | 56 | 652 | 1020 | 76 | 94 |
|
||||||
|
| 1 | Fulvous whistling-duck | Dendrocygna bicolor | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 45 | 53 | 712 | 1050 | 85 | 93 |
|
||||||
|
| 2 | Snow goose | Anser caerulescens | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 79 | 2050 | 4050 | 135 | 165 |
|
||||||
|
| 3 | Ross's goose | Anser rossii | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 57.3 | 64 | 1066 | 1567 | 113 | 116 |
|
||||||
|
| 4 | Greater white-fronted goose | Anser albifrons | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 81 | 1930 | 3310 | 130 | 165 |
|
||||||
|
|
||||||
|
Let's start by plotting some of the numeric data using a basic line plot. Suppose you wanted a view of the maximum wingspan for these interesting birds.
|
||||||
|
|
||||||
|
```python
|
||||||
|
wingspan = birds['MaxWingspan']
|
||||||
|
wingspan.plot()
|
||||||
|
```
|
||||||
|
![Max Wingspan](images/max-wingspan.png)
|
||||||
|
|
||||||
|
What do you notice immediately? There seems to be at least one outlier - that's quite a wingspan! A 2300 centimeter wingspan equals 23 meters - are there Pterodactyls roaming Minnesota? Let's investigate.
|
||||||
|
|
||||||
|
While you could do a quick sort in Excel to find those outliers, which are probably typos, continue the visualization process by working from within the plot.
|
||||||
|
|
||||||
|
Add labels to the y-axis to show what kind of birds are in question:
|
||||||
|
|
||||||
|
```
|
||||||
|
plt.title('Max Wingspan in Centimeters')
|
||||||
|
plt.ylabel('Wingspan (CM)')
|
||||||
|
plt.xlabel('Birds')
|
||||||
|
plt.xticks(rotation=45)
|
||||||
|
x = birds['Name']
|
||||||
|
y = birds['MaxWingspan']
|
||||||
|
|
||||||
|
plt.plot(x, y)
|
||||||
|
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![wingspan with labels](images/max-wingspan-labels.png)
|
||||||
|
|
||||||
|
Even with the rotation of the labels set to 45 degrees, there are too many to read. Let's try a different strategy: label only those outliers and set the labels within the chart. You can use a scatter chart to make more room for the labeling:
|
||||||
|
|
||||||
|
```python
|
||||||
|
plt.title('Max Wingspan in Centimeters')
|
||||||
|
plt.ylabel('Wingspan (CM)')
|
||||||
|
plt.tick_params(axis='both',which='both',labelbottom=False,bottom=False)
|
||||||
|
|
||||||
|
for i in range(len(birds)):
|
||||||
|
x = birds['Name'][i]
|
||||||
|
y = birds['MaxWingspan'][i]
|
||||||
|
plt.plot(x, y, 'bo')
|
||||||
|
if birds['MaxWingspan'][i] > 500:
|
||||||
|
plt.text(x, y * (1 - 0.05), birds['Name'][i], fontsize=12)
|
||||||
|
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
What's going on here? You used `tick_params` to hide the bottom labels and then created a loop over your birds dataset. Plotting the chart with small round blue dots by using `bo`, you checked for any bird with a maximum wingspan over 500 and displayed their label next to the dot if so. You offset the labels a little on the y axis (`y * (1 - 0.05)`) and used the bird name as a label.
|
||||||
|
|
||||||
|
What did you discover?
|
||||||
|
|
||||||
|
![outliers](images/labeled-wingspan.png)
|
||||||
|
## Filter your data
|
||||||
|
|
||||||
|
Both the Bald Eagle and the Prairie Falcon, while probably very large birds, appear to be mislabeled, with an extra `0` added to their maximum wingspan. It's unlikely that you'll meet a Bald Eagle with a 25 meter wingspan, but if so, please let us know! Let's create a new dataframe without those two outliers:
|
||||||
|
|
||||||
|
```python
|
||||||
|
plt.title('Max Wingspan in Centimeters')
|
||||||
|
plt.ylabel('Wingspan (CM)')
|
||||||
|
plt.xlabel('Birds')
|
||||||
|
plt.tick_params(axis='both',which='both',labelbottom=False,bottom=False)
|
||||||
|
for i in range(len(birds)):
|
||||||
|
x = birds['Name'][i]
|
||||||
|
y = birds['MaxWingspan'][i]
|
||||||
|
if birds['Name'][i] not in ['Bald eagle', 'Prairie falcon']:
|
||||||
|
plt.plot(x, y, 'bo')
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
By filtering out outliers, your data is now more cohesive and understandable.
|
||||||
|
|
||||||
|
![scatterplot of wingspans](images/scatterplot-wingspan.png)
|
||||||
|
|
||||||
|
Now that we have a cleaner dataset at least in terms of wingspan, let's discover more about these birds.
|
||||||
|
|
||||||
|
While line and scatter plots can display information about data values and their distributions, we want to think about the values inherent in this dataset. You could create visualizations to answer this following questions about quantity:
|
||||||
|
|
||||||
|
> How many categories of birds are there, and what are their numbers?
|
||||||
|
> How many birds are extinct, endangered, rare, or common?
|
||||||
|
> How many are there of the various genus and orders in Linnaeus's terminology?
|
||||||
|
## Explore bar charts
|
||||||
|
|
||||||
|
Bar charts are very useful when you need to show groupings of data. Let's explore the categories of birds that exist in this dataset to see which is the most common by number.
|
||||||
|
|
||||||
|
In the notebook file, create a basic bar chart
|
||||||
|
|
||||||
|
✅ Note, you can either filter out the two outlier birds we identified in the previous section, edit the typo in their wingspan, or leave them in for these exercises which do not depend on wingspan values.
|
||||||
|
|
||||||
|
If you want to create a bar chart, you can select the data you want to focus on. Bar charts can be created from raw data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
birds.plot(x='Category',
|
||||||
|
kind='bar',
|
||||||
|
stacked=True,
|
||||||
|
title='Birds of Minnesota')
|
||||||
|
|
||||||
|
```
|
||||||
|
![full data as a bar chart](images/full-data-bar.png)
|
||||||
|
|
||||||
|
This bar chart, however, is unreadable because there is too much non-grouped data. You need to select only the data that you want to chart, so let's look at the length of birds based on their category.
|
||||||
|
|
||||||
|
Filter your data to include only the bird's category.
|
||||||
|
|
||||||
|
✅ Notice that you use Pandas to manage the data, and then let Matplotlib do the charting.
|
||||||
|
|
||||||
|
Since there are many categories, you can display this chart vertically and tweak its height to account for all the data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
category_count = birds.value_counts(birds['Category'].values, sort=True)
|
||||||
|
plt.rcParams['figure.figsize'] = [6, 12]
|
||||||
|
category_count.plot.barh()
|
||||||
|
```
|
||||||
|
![category and length](images/category-counts.png)
|
||||||
|
|
||||||
|
This bar chart shows a good view of the amount of birds in each category. In a blink of an eye, you see that the largest number of birds in this region are in the Ducks/Geese/Waterfowl category. Minnesota is the 'land of 10,000 lakes' so this isn't surprising!
|
||||||
|
|
||||||
|
✅ Try some other counts on this dataset. Does anything surprise you?
|
||||||
|
|
||||||
|
## Comparing data
|
||||||
|
|
||||||
|
You can try different comparisons of grouped data by creating new axes. Try a comparison of the MaxLength of a bird, based on its category:
|
||||||
|
|
||||||
|
```python
|
||||||
|
maxlength = birds['MaxLength']
|
||||||
|
plt.barh(y=birds['Category'], width=maxlength)
|
||||||
|
plt.rcParams['figure.figsize'] = [6, 12]
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![comparing data](images/category-length.png)
|
||||||
|
|
||||||
|
Nothing is surprising here: hummingbirds have the least MaxLength compared to Pelicans or Geese. It's good when data makes logical sense!
|
||||||
|
|
||||||
|
You can create more interesting visualizations of bar charts by superimposing data. Let's superimpose Minimum and Maximum Length on a given bird category:
|
||||||
|
|
||||||
|
```python
|
||||||
|
minLength = birds['MinLength']
|
||||||
|
maxLength = birds['MaxLength']
|
||||||
|
category = birds['Category']
|
||||||
|
|
||||||
|
plt.barh(category, maxLength)
|
||||||
|
plt.barh(category, minLength)
|
||||||
|
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
In this plot you can see the range, per category, of the Minimum Length and Maximum length of a given bird category. You can safely say that, given this data, the bigger the bird, the larger its length range. Fascinating!
|
||||||
|
|
||||||
|
![superimposed values](images/superimposed.png)
|
||||||
|
|
||||||
|
## 🚀 Challenge
|
||||||
|
|
||||||
|
This bird dataset offers a wealth of information about different types of birds within a particular ecosystem. Search around the internet and see if you can find other bird-oriented datasets. Practice building charts and graphs around these birds to discover facts you didn't realize.
|
||||||
|
## Post-Lecture Quiz
|
||||||
|
|
||||||
|
[Post-lecture quiz]()
|
||||||
|
|
||||||
|
## Review & Self Study
|
||||||
|
|
||||||
|
This first lesson has given you some information about how to use Matplotlib to visualize quantities. Do some research around other ways to work with datasets for visualization. [Plotly](https://github.com/plotly/plotly.py) is one that we won't cover in these lessons, so take a look at what it can offer.
|
||||||
|
## Assignment
|
||||||
|
|
||||||
|
[Lines, Scatters, and Bars](assignment.md)
|
@ -0,0 +1,11 @@
|
|||||||
|
# Lines, Scatters and Bars
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
In this lesson, you worked with line charts, scatterplots, and bar charts to show interesting facts about this dataset. In this assignment, dig deeper into the dataset to discover a fact about a given type of bird. For example, create a notebook visualizing all the interesting data you can uncover about Snow Geese. Use the three plots mentioned above to tell a story in your notebook.
|
||||||
|
|
||||||
|
## Rubric
|
||||||
|
|
||||||
|
Exemplary | Adequate | Needs Improvement
|
||||||
|
--- | --- | -- |
|
||||||
|
A notebook is presented with good annotations, solid storytelling, and attractive graphs | The notebook is missing one of these elements | The notebook is missing two of these elements
|
After Width: | Height: | Size: 50 KiB |
After Width: | Height: | Size: 51 KiB |
After Width: | Height: | Size: 43 KiB |
After Width: | Height: | Size: 9.9 KiB |
After Width: | Height: | Size: 49 KiB |
After Width: | Height: | Size: 10 KiB |
After Width: | Height: | Size: 12 KiB |
After Width: | Height: | Size: 52 KiB |
@ -0,0 +1,35 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Let's learn about birds\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python",
|
||||||
|
"version": "3.7.0",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"file_extension": ".py"
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3.7.0 64-bit"
|
||||||
|
},
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,191 @@
|
|||||||
|
# Visualizing Distributions
|
||||||
|
|
||||||
|
In the previous lesson, you learned some interesting facts about a dataset about the birds of Minnesota. You found some erroneous data by visualizing outliers and looked at the differences between bird categories by their maximum length.
|
||||||
|
## Pre-Lecture Quiz
|
||||||
|
|
||||||
|
[Pre-lecture quiz]()
|
||||||
|
## Explore the birds dataset
|
||||||
|
|
||||||
|
Another way to dig into data is by looking at its distribution, or how the data is organized along an axis. Perhaps, for example, you'd like to learn about the general distribution, for this dataset, of maximum wingspan or maximum body mass for the birds of Minnesota.
|
||||||
|
|
||||||
|
Let's discover some facts about the distributions of data in this dataset. In the _notebook.ipynb_ file at the root of this lesson folder, import Pandas, Matplotlib, and your data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
birds = pd.read_csv('../../data/birds.csv')
|
||||||
|
birds.head()
|
||||||
|
```
|
||||||
|
|
||||||
|
In general, you can quickly look at the way data is distributed by using a scatter plot as we did in the previous lesson:
|
||||||
|
|
||||||
|
```python
|
||||||
|
birds.plot(kind='scatter',x='MaxLength',y='Order',figsize=(12,8))
|
||||||
|
|
||||||
|
plt.title('Max Length per Order')
|
||||||
|
plt.ylabel('Order')
|
||||||
|
plt.xlabel('Max Length')
|
||||||
|
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram.
|
||||||
|
## Working with histograms
|
||||||
|
|
||||||
|
Matplotlib offers very good ways to visualize data distribution using Histograms. This type of chart is like a bar chart where the distribution can be seen via a rise and fall of the bars. To build a histogram, you need numeric data.To build a Histogram, you can plot a chart defining the kind as 'hist' for Histogram. This chart show the distribution of MaxBodyMass for the entire dataset's range of numeric data. By dividing the array of data it is given into smaller bins, it can display the distribution of the data's values:
|
||||||
|
|
||||||
|
```python
|
||||||
|
birds['MaxBodyMass'].plot(kind = 'hist', bins = 10, figsize = (12,12))
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![distribution over the entire dataset](images/dist1.png)
|
||||||
|
|
||||||
|
As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight on the data by changing the `bins` parameter to a higher number, something like 30:
|
||||||
|
|
||||||
|
```python
|
||||||
|
birds['MaxBodyMass'].plot(kind = 'hist', bins = 30, figsize = (12,12))
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![distribution over the entire dataset with larger bins param](images/dist2.png)
|
||||||
|
|
||||||
|
This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range:
|
||||||
|
|
||||||
|
Filter your data to get only those birds whose body mass is under 60, and show 40 `bins`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
filteredBirds = birds[(birds['MaxBodyMass'] > 1) & (birds['MaxBodyMass'] < 60)]
|
||||||
|
filteredBirds['MaxBodyMass'].plot(kind = 'hist',bins = 40,figsize = (12,12))
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![filtered histogram](images/dist3.png)
|
||||||
|
|
||||||
|
✅ Try some other filters and data points. To see the full distribution of the data, remove the `['MaxBodyMass']` filter to show labeled distributions.
|
||||||
|
|
||||||
|
The histogram offers some nice color and labeling enhancements to try as well:
|
||||||
|
|
||||||
|
Create a 2D histogram to compare the relationship between two distributions. Let's compare `MaxBodyMass` vs. `MaxLength`. Matplotlib offers a built-in way to show convergence using brighter colors:
|
||||||
|
|
||||||
|
```python
|
||||||
|
x = filteredBirds['MaxBodyMass']
|
||||||
|
y = filteredBirds['MaxLength']
|
||||||
|
|
||||||
|
fig, ax = plt.subplots(tight_layout=True)
|
||||||
|
hist = ax.hist2d(x, y)
|
||||||
|
```
|
||||||
|
There appears to be an expected correlation between these two elements along an expected axis, with one particularly strong point of convergence:
|
||||||
|
|
||||||
|
![2D plot](images/2D.png)
|
||||||
|
|
||||||
|
Histograms work well by default for numeric data. What if you need to see distributions according to text data?
|
||||||
|
## Explore the dataset for distributions using text data
|
||||||
|
|
||||||
|
This dataset also includes good information about the bird category and its genus, species and family as well as its conservation status. Let's dig into this conservation information. What is the distribution of the birds according to their conservation status?
|
||||||
|
|
||||||
|
> ✅ In the dataset, several acronyms are used to describe conservation status. These acronyms come from the [IUCN Red List Categories](https://www.iucnredlist.org/), an organization that catalogue species status.
|
||||||
|
>
|
||||||
|
> - CR: Critically Endangered
|
||||||
|
> - EN: Endangered
|
||||||
|
> - EX: Extinct
|
||||||
|
> - LC: Least Concern
|
||||||
|
> - NT: Near Threatened
|
||||||
|
> - VU: Vulnerable
|
||||||
|
|
||||||
|
These are text-based values so you will need to do a transform to create a histogram. Using the filteredBirds dataframe, display its conservation status alongside its Minimum Wingspan. What do you see?
|
||||||
|
|
||||||
|
```python
|
||||||
|
x1 = filteredBirds.loc[filteredBirds.ConservationStatus=='EX', 'MinWingspan']
|
||||||
|
x2 = filteredBirds.loc[filteredBirds.ConservationStatus=='CR', 'MinWingspan']
|
||||||
|
x3 = filteredBirds.loc[filteredBirds.ConservationStatus=='EN', 'MinWingspan']
|
||||||
|
x4 = filteredBirds.loc[filteredBirds.ConservationStatus=='NT', 'MinWingspan']
|
||||||
|
x5 = filteredBirds.loc[filteredBirds.ConservationStatus=='VU', 'MinWingspan']
|
||||||
|
x6 = filteredBirds.loc[filteredBirds.ConservationStatus=='LC', 'MinWingspan']
|
||||||
|
|
||||||
|
kwargs = dict(alpha=0.5, bins=20)
|
||||||
|
|
||||||
|
plt.hist(x1, **kwargs, color='red', label='Extinct')
|
||||||
|
plt.hist(x2, **kwargs, color='orange', label='Critically Endangered')
|
||||||
|
plt.hist(x3, **kwargs, color='yellow', label='Endangered')
|
||||||
|
plt.hist(x4, **kwargs, color='green', label='Near Threatened')
|
||||||
|
plt.hist(x5, **kwargs, color='blue', label='Vulnerable')
|
||||||
|
plt.hist(x6, **kwargs, color='gray', label='Least Concern')
|
||||||
|
|
||||||
|
plt.gca().set(title='Conservation Status', ylabel='Max Body Mass')
|
||||||
|
plt.legend();
|
||||||
|
```
|
||||||
|
|
||||||
|
![wingspan and conservation collation](images/histogram-conservation.png)
|
||||||
|
|
||||||
|
There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation?
|
||||||
|
|
||||||
|
## Density plots
|
||||||
|
|
||||||
|
You may have noticed that the histograms we have looked at so far are 'stepped' and do not flow smoothy in an arc. To show a smoother density chart, you can try a density plot.
|
||||||
|
|
||||||
|
To work with density plots, familiarize yourself with a new plotting library, [Seaborn](https://seaborn.pydata.org/generated/seaborn.kdeplot.html).
|
||||||
|
|
||||||
|
Loading Seaborn, try a basic density plot:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import seaborn as sns
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
sns.kdeplot(filteredBirds['MinWingspan'])
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![Density plot](images/density1.png)
|
||||||
|
|
||||||
|
You can see how the plot echoes the previous one for Minimum Wingspan data; it's just a bit smoother. According to Seaborn's documentation, "Relative to a histogram, KDE can produce a plot that is less cluttered and more interpretable, especially when drawing multiple distributions. But it has the potential to introduce distortions if the underlying distribution is bounded or not smooth. Like a histogram, the quality of the representation also depends on the selection of good smoothing parameters." [source](https://seaborn.pydata.org/generated/seaborn.kdeplot.html) In other words, outliers as always will make your charts behave badly.
|
||||||
|
|
||||||
|
If you wanted to revisit that jagged MaxBodyMass line in the second chart you built, you could smooth it out very well by recreating it using this method:
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.kdeplot(filteredBirds['MaxBodyMass'])
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![smooth bodymass line](images/density2.png)
|
||||||
|
|
||||||
|
If you wanted a smooth, but not too smooth line, edit the `bw_adjust` parameter
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.kdeplot(filteredBirds['MaxBodyMass'], bw_adjust=.2)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
![less smooth bodymass line](images/density3.png)
|
||||||
|
|
||||||
|
✅ Read about the parameters available for this type of plot and experiment!
|
||||||
|
|
||||||
|
This type of chart offers beautifully explanatory visualizations. With a few lines of code, for example, you can show the max body mass density per bird Order:
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.kdeplot(
|
||||||
|
data=filteredBirds, x="MaxBodyMass", hue="Order",
|
||||||
|
fill=True, common_norm=False, palette="crest",
|
||||||
|
alpha=.5, linewidth=0,
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
![bodymass per order](images/density4.png)
|
||||||
|
|
||||||
|
You can also map the density of several variables in one chart. Text the MaxLength and MinLength of a bird compared to their conservation status:
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.kdeplot(data=filteredBirds, x="MinLength", y="MaxLength", hue="ConservationStatus")
|
||||||
|
```
|
||||||
|
|
||||||
|
![multiple densities, superimposed](images/multi.png)
|
||||||
|
|
||||||
|
Perhaps it's worth researching whether the cluster of 'Vulnerable' birds according to their lengths is meaningful or not.
|
||||||
|
|
||||||
|
## 🚀 Challenge
|
||||||
|
|
||||||
|
Histograms are a more sophisticated type of chart than basic scatterplots, bar charts, or line charts. Go on a search on the internet to find good examples of the use of histograms. How are they used, what do they demonstrate, and in what fields or areas of inquiry do they tend to be used?
|
||||||
|
|
||||||
|
## Post-Lecture Quiz
|
||||||
|
|
||||||
|
[Post-lecture quiz]()
|
||||||
|
|
||||||
|
## Review & Self Study
|
||||||
|
|
||||||
|
In this lesson, you used Matplotlib and started working with Seaborn to show more sophisticated charts. Do some research on `kdeplot` in Seaborn, a "continuous probability density curve in one or more dimensions". Read through [the documentation](https://seaborn.pydata.org/generated/seaborn.kdeplot.html) to understand how it works.
|
||||||
|
|
||||||
|
## Assignment
|
||||||
|
|
||||||
|
[Apply your skills](assignment.md)
|
@ -0,0 +1,10 @@
|
|||||||
|
# Apply your skills
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
So far, you have worked with the Minnesota birds dataset to discover information about bird quantities and population density. Practice your application of these techniques by trying a different dataset, perhaps sourced from [Kaggle]. Build a notebook to tell a story about this dataset, and make sure to use histograms when discussing it.
|
||||||
|
## Rubric
|
||||||
|
|
||||||
|
Exemplary | Adequate | Needs Improvement
|
||||||
|
--- | --- | -- |
|
||||||
|
A notebook is presented with annotations about this dataset, including it source, and uses at least 5 histograms to discover facts about the data. | A notebook is presented with incomplete annotations or bugs | A notebook is presented without annotations and includes bugs
|
After Width: | Height: | Size: 4.1 KiB |
After Width: | Height: | Size: 15 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 18 KiB |
After Width: | Height: | Size: 30 KiB |
After Width: | Height: | Size: 8.9 KiB |
After Width: | Height: | Size: 8.2 KiB |
After Width: | Height: | Size: 7.3 KiB |
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 31 KiB |
After Width: | Height: | Size: 30 KiB |
@ -0,0 +1,19 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Bird distributions"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,184 @@
|
|||||||
|
# Visualizing Proportions
|
||||||
|
|
||||||
|
In this lesson, you will use a different nature-focused dataset to visualize proportions, such as how many different types of fungi populate a given dataset about mushrooms. Let's explore these fascinating fungi using a dataset sourced from Audubon listing details about 23 species of gilled mushrooms in the Agaricus and Lepiota families. You will experiment with tasty visualizations such as:
|
||||||
|
|
||||||
|
- Pie charts 🥧
|
||||||
|
- Donut charts 🍩
|
||||||
|
- Waffle charts 🧇
|
||||||
|
|
||||||
|
> 💡 A very interesting project called [Charticulator](https://charticulator.com) by Microsoft Research offers a free drag and drop interface for data visualizations. In one of their tutorials they also use this mushroom dataset! So you can explore the data and learn the library at the same time: https://charticulator.com/tutorials/tutorial4.html
|
||||||
|
|
||||||
|
## Pre-Lecture Quiz
|
||||||
|
|
||||||
|
[Pre-lecture quiz]()
|
||||||
|
|
||||||
|
## Get to know your mushrooms 🍄
|
||||||
|
|
||||||
|
Mushrooms are very interesting. Let's import a dataset to study them.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
mushrooms = pd.read_csv('../../data/mushrooms.csv')
|
||||||
|
mushrooms.head()
|
||||||
|
```
|
||||||
|
A table is printed out with some great data for analysis:
|
||||||
|
|
||||||
|
|
||||||
|
| class | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | stalk-root | stalk-surface-above-ring | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
|
||||||
|
| --------- | --------- | ----------- | --------- | ------- | ------- | --------------- | ------------ | --------- | ---------- | ----------- | ---------- | ------------------------ | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
|
||||||
|
| Poisonous | Convex | Smooth | Brown | Bruises | Pungent | Free | Close | Narrow | Black | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
|
||||||
|
| Edible | Convex | Smooth | Yellow | Bruises | Almond | Free | Close | Broad | Black | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Grasses |
|
||||||
|
| Edible | Bell | Smooth | White | Bruises | Anise | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Meadows |
|
||||||
|
| Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
|
||||||
|
|
||||||
|
Right away, you notice that all the data is textual. You will have to edit this data to be able to use it in a chart. Most of the data, in fact, is represented as an object:
|
||||||
|
|
||||||
|
```python
|
||||||
|
print(mushrooms.select_dtypes(["object"]).columns)
|
||||||
|
```
|
||||||
|
|
||||||
|
The output is:
|
||||||
|
|
||||||
|
```output
|
||||||
|
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
|
||||||
|
'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
|
||||||
|
'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
|
||||||
|
'stalk-surface-below-ring', 'stalk-color-above-ring',
|
||||||
|
'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
|
||||||
|
'ring-type', 'spore-print-color', 'population', 'habitat'],
|
||||||
|
dtype='object')
|
||||||
|
```
|
||||||
|
Take this data and convert the 'class' column to a category:
|
||||||
|
|
||||||
|
```python
|
||||||
|
cols = mushrooms.select_dtypes(["object"]).columns
|
||||||
|
mushrooms[cols] = mushrooms[cols].astype('category')
|
||||||
|
```
|
||||||
|
Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class:
|
||||||
|
|
||||||
|
|
||||||
|
| | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
|
||||||
|
| --------- | --------- | ----------- | --------- | ------- | ---- | --------------- | ------------ | --------- | ---------- | ----------- | --- | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
|
||||||
|
| class | | | | | | | | | | | | | | | | | | | | | |
|
||||||
|
| Edible | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | ... | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 |
|
||||||
|
| Poisonous | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | ... | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 |
|
||||||
|
|
||||||
|
If you follow the order presented in this table to create your class category labels, you can build a pie chart:
|
||||||
|
|
||||||
|
## Pie!
|
||||||
|
|
||||||
|
```python
|
||||||
|
labels=['Edible','Poisonous']
|
||||||
|
plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%')
|
||||||
|
plt.title('Edible?')
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
Voila, a pie chart showing the proportions of this data according to these two classes of mushroom. It's quite important to get the order of labels correct, especially here, so be sure to verify the order with which the label array is built!
|
||||||
|
|
||||||
|
![pie chart](images/pie1.png)
|
||||||
|
|
||||||
|
## Donuts!
|
||||||
|
|
||||||
|
A somewhat more visually interesting pie chart is a donut chart, which is a pie chart with a hole in the middle. Let's look at our data using this method.
|
||||||
|
|
||||||
|
Take a look at the various habitats where mushrooms grow.
|
||||||
|
|
||||||
|
```python
|
||||||
|
habitat=mushrooms.groupby(['habitat']).count()
|
||||||
|
habitat
|
||||||
|
```
|
||||||
|
Here, you are grouping your data by habitat. There are 7 listed, so use those as labels for your donut chart:
|
||||||
|
|
||||||
|
```python
|
||||||
|
labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood']
|
||||||
|
|
||||||
|
plt.pie(habitat['class'], labels=labels,
|
||||||
|
autopct='%1.1f%%', pctdistance=0.85)
|
||||||
|
|
||||||
|
center_circle = plt.Circle((0, 0), 0.40, fc='white')
|
||||||
|
fig = plt.gcf()
|
||||||
|
|
||||||
|
fig.gca().add_artist(center_circle)
|
||||||
|
|
||||||
|
plt.title('Mushroom Habitats')
|
||||||
|
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
![donut chart](images/donut.png)
|
||||||
|
|
||||||
|
This code draws a chart and a center circle, then adds that center circle in. Edit the width of the center circle by changing `0.40` to another value.
|
||||||
|
|
||||||
|
Donut charts can be tweaked several ways to change the labels. The labels in particular can be highlighted for readability. Learn more in the [docs](https://matplotlib.org/stable/gallery/pie_and_polar_charts/pie_and_donut_labels.html?highlight=donut).
|
||||||
|
|
||||||
|
Now that you know how to group your data and then display it as a pie or donut, you can explore other types of charts. Try a waffle chart, which is just a different way of exploring quantity.
|
||||||
|
## Waffles!
|
||||||
|
|
||||||
|
A 'waffle' type chart is a different way to visualize quantities as a 2D array of squares. Try visualizing the different quantities of mushroom cap colors in this dataset. To do this, you need to install a helper library called [PyWaffle](https://pypi.org/project/pywaffle/) and use Matplotlib:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pip install pywaffle
|
||||||
|
```
|
||||||
|
|
||||||
|
Select a segment of your data to group:
|
||||||
|
|
||||||
|
```python
|
||||||
|
capcolor=mushrooms.groupby(['cap-color']).count()
|
||||||
|
capcolor
|
||||||
|
```
|
||||||
|
|
||||||
|
Create a waffle chart by creating labels and then grouping your data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from pywaffle import Waffle
|
||||||
|
|
||||||
|
data ={'color': ['brown', 'buff', 'cinnamon', 'green', 'pink', 'purple', 'red', 'white', 'yellow'],
|
||||||
|
'amount': capcolor['class']
|
||||||
|
}
|
||||||
|
|
||||||
|
df = pd.DataFrame(data)
|
||||||
|
|
||||||
|
fig = plt.figure(
|
||||||
|
FigureClass = Waffle,
|
||||||
|
rows = 100,
|
||||||
|
values = df.amount,
|
||||||
|
labels = list(df.color),
|
||||||
|
figsize = (30,30),
|
||||||
|
colors=["brown", "tan", "maroon", "green", "pink", "purple", "red", "whitesmoke", "yellow"],
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Using a waffle chart, you can plainly see the proportions of cap color of this mushroom dataset. Interestingly, there are many green-capped mushrooms!
|
||||||
|
|
||||||
|
![waffle chart](images/waffle.png)
|
||||||
|
|
||||||
|
✅ Pywaffle supports icons within the charts that use any icon available in [Font Awesome](https://fontawesome.com/). Do some experiments to create an even more interesting waffle chart using icons instead of squares.
|
||||||
|
|
||||||
|
In this lesson you learned three ways to visualize proportions. First, you need to group your data into categories and then decide which is the best way to display the data - pie, donut, or waffle. All are delicious and gratify the user with an instant snapshot of a dataset.
|
||||||
|
|
||||||
|
## 🚀 Challenge
|
||||||
|
|
||||||
|
Try recreating these tasty charts in [Charticulator](https://charticulator.com).
|
||||||
|
## Post-Lecture Quiz
|
||||||
|
|
||||||
|
[Post-lecture quiz]()
|
||||||
|
|
||||||
|
## Review & Self Study
|
||||||
|
|
||||||
|
Sometimes it's not obvious when to use a pie, donut, or waffle chart. Here are some articles to read on this topic:
|
||||||
|
|
||||||
|
https://www.beautiful.ai/blog/battle-of-the-charts-pie-chart-vs-donut-chart
|
||||||
|
|
||||||
|
https://medium.com/@hypsypops/pie-chart-vs-donut-chart-showdown-in-the-ring-5d24fd86a9ce
|
||||||
|
|
||||||
|
https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c6.htm
|
||||||
|
|
||||||
|
https://medium.datadriveninvestor.com/data-visualization-done-the-right-way-with-tableau-waffle-chart-fdf2a19be402
|
||||||
|
|
||||||
|
Do some research to find more information on this sticky decision.
|
||||||
|
## Assignment
|
||||||
|
|
||||||
|
[Try it in Excel](assignment.md)
|
@ -0,0 +1,11 @@
|
|||||||
|
# Try it in Excel
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
Did you know you can create donut, pie and waffle charts in Excel? Using a dataset of your choice, create these three charts right in an Excel spreadsheet
|
||||||
|
|
||||||
|
## Rubric
|
||||||
|
|
||||||
|
| Exemplary | Adequate | Needs Improvement |
|
||||||
|
| ------------------------------------------------------- | ------------------------------------------------- | ------------------------------------------------------ |
|
||||||
|
| An Excel spreadsheet is presented with all three charts | An Excel spreadsheet is presented with two charts | An Excel spreadsheet is presented with only one charts |
|
After Width: | Height: | Size: 15 KiB |
After Width: | Height: | Size: 6.7 KiB |
After Width: | Height: | Size: 59 KiB |
@ -0,0 +1,19 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# 🍄 Mushroom Proportions"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,174 @@
|
|||||||
|
# Visualizing Relationships: All About Honey 🍯
|
||||||
|
|
||||||
|
Continuing with the nature focus of our research, let's discover interesting visualizations to show the relationships between various types of honey, according to a dataset derived from the [United States Department of Agriculture](https://www.nass.usda.gov/About_NASS/index.php).
|
||||||
|
|
||||||
|
This dataset of about 600 items displays honey production in many U.S. states. So, for example, you can look at the number of colonies, yield per colony, total production, stocks, price per pound, and value of the honey produced in a given state from 1998-2012, with one row per year for each state.
|
||||||
|
|
||||||
|
It will be interesting to visualize the relationship between a given state's production per year and, for example, the price of honey in that state. Alternately, you could visualize the relationship between states' honey yield per colony. This year span covers the devastating 'CCD' or 'Colony Collapse Disorder' first seen in 2006 (http://npic.orst.edu/envir/ccd.html), so it is a poignant dataset to study. 🐝
|
||||||
|
|
||||||
|
## Pre-Lecture Quiz
|
||||||
|
|
||||||
|
[Pre-lecture quiz]()
|
||||||
|
|
||||||
|
In this lesson, you can use Seaborn, which you use before, as a good library to visualize relationships between variables. Particularly interesting is the use of Seaborn's `relplot` function that allows scatter plots and line plots to quickly visualize '[statistical relationships](https://seaborn.pydata.org/tutorial/relational.html?highlight=relationships)', which allow the data scientist to better understand how variables relate to each other.
|
||||||
|
|
||||||
|
## Scatterplots
|
||||||
|
|
||||||
|
Use a scatterplot to show how the price of honey has evolved, year over year, per state. Seaborn, using `relplot`, conveniently groups the state data and displays data points for both categorical and numeric data.
|
||||||
|
|
||||||
|
Let's start by importing the data and Seaborn:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import seaborn as sns
|
||||||
|
honey = pd.read_csv('../../data/honey.csv')
|
||||||
|
honey.head()
|
||||||
|
```
|
||||||
|
You notice that the honey data has several interesting columns, including year and price per pound. Let's explore this data, grouped by U.S. state:
|
||||||
|
|
||||||
|
| state | numcol | yieldpercol | totalprod | stocks | priceperlb | prodvalue | year |
|
||||||
|
| ----- | ------ | ----------- | --------- | -------- | ---------- | --------- | ---- |
|
||||||
|
| AL | 16000 | 71 | 1136000 | 159000 | 0.72 | 818000 | 1998 |
|
||||||
|
| AZ | 55000 | 60 | 3300000 | 1485000 | 0.64 | 2112000 | 1998 |
|
||||||
|
| AR | 53000 | 65 | 3445000 | 1688000 | 0.59 | 2033000 | 1998 |
|
||||||
|
| CA | 450000 | 83 | 37350000 | 12326000 | 0.62 | 23157000 | 1998 |
|
||||||
|
| CO | 27000 | 72 | 1944000 | 1594000 | 0.7 | 1361000 | 1998 |
|
||||||
|
|
||||||
|
|
||||||
|
Create a basic scatterplot to show the relationship between the price per pound of honey and its U.S. state of origin. Make the `y` axis tall enough to display all the states:
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.relplot(x="priceperlb", y="state", data=honey, height=15, aspect=.5);
|
||||||
|
```
|
||||||
|
![scatterplot 1](images/scatter1.png)
|
||||||
|
|
||||||
|
Now, show the same data with a honey color scheme to show how the price evolves over the years. You can do this by adding a 'hue' parameter to show the change, year over year:
|
||||||
|
|
||||||
|
> ✅ Learn more about the [color palettes you can use in Seaborn](https://seaborn.pydata.org/tutorial/color_palettes.html) - try a beautiful rainbow color scheme!
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.relplot(x="priceperlb", y="state", hue="year", palette="YlOrBr", data=honey, height=15, aspect=.5);
|
||||||
|
```
|
||||||
|
![scatterplot 2](images/scatter2.png)
|
||||||
|
|
||||||
|
With this color scheme change, you can see that there's obviously a strong progression over the years in terms of honey price per pound. Indeed, if you look at a sample set in the data to verify (pick a given state, Arizona for example) you can see a pattern of price increases year over year, with few exceptions:
|
||||||
|
|
||||||
|
| state | numcol | yieldpercol | totalprod | stocks | priceperlb | prodvalue | year |
|
||||||
|
| ----- | ------ | ----------- | --------- | ------- | ---------- | --------- | ---- |
|
||||||
|
| AZ | 55000 | 60 | 3300000 | 1485000 | 0.64 | 2112000 | 1998 |
|
||||||
|
| AZ | 52000 | 62 | 3224000 | 1548000 | 0.62 | 1999000 | 1999 |
|
||||||
|
| AZ | 40000 | 59 | 2360000 | 1322000 | 0.73 | 1723000 | 2000 |
|
||||||
|
| AZ | 43000 | 59 | 2537000 | 1142000 | 0.72 | 1827000 | 2001 |
|
||||||
|
| AZ | 38000 | 63 | 2394000 | 1197000 | 1.08 | 2586000 | 2002 |
|
||||||
|
| AZ | 35000 | 72 | 2520000 | 983000 | 1.34 | 3377000 | 2003 |
|
||||||
|
| AZ | 32000 | 55 | 1760000 | 774000 | 1.11 | 1954000 | 2004 |
|
||||||
|
| AZ | 36000 | 50 | 1800000 | 720000 | 1.04 | 1872000 | 2005 |
|
||||||
|
| AZ | 30000 | 65 | 1950000 | 839000 | 0.91 | 1775000 | 2006 |
|
||||||
|
| AZ | 30000 | 64 | 1920000 | 902000 | 1.26 | 2419000 | 2007 |
|
||||||
|
| AZ | 25000 | 64 | 1600000 | 336000 | 1.26 | 2016000 | 2008 |
|
||||||
|
| AZ | 20000 | 52 | 1040000 | 562000 | 1.45 | 1508000 | 2009 |
|
||||||
|
| AZ | 24000 | 77 | 1848000 | 665000 | 1.52 | 2809000 | 2010 |
|
||||||
|
| AZ | 23000 | 53 | 1219000 | 427000 | 1.55 | 1889000 | 2011 |
|
||||||
|
| AZ | 22000 | 46 | 1012000 | 253000 | 1.79 | 1811000 | 2012 |
|
||||||
|
|
||||||
|
|
||||||
|
Another way to visualize this progression is to use size, rather than color. For colorblind users, this might be a better option. Edit your visualization to show an increase of price by an increase in dot circumference:
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.relplot(x="priceperlb", y="state", size="year", data=honey, height=15, aspect=.5);
|
||||||
|
```
|
||||||
|
You can see the size of the dots gradually increasing.
|
||||||
|
|
||||||
|
![scatterplot 3](images/scatter3.png)
|
||||||
|
|
||||||
|
Is this a simple case of supply and demand? Due to factors such as climate change and colony collapse, is there less honey available for purchase year over year, and thus the price increases?
|
||||||
|
|
||||||
|
To discover a correlation between some of the variables in this dataset, let's explore some line charts.
|
||||||
|
|
||||||
|
## Line charts
|
||||||
|
|
||||||
|
Question: Is there a clear rise in price of honey per pound year over year? You can most easily discover that by creating a single line chart:
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.relplot(x="year", y="priceperlb", kind="line", data=honey);
|
||||||
|
```
|
||||||
|
Answer: Yes, with some exceptions around the year 2003:
|
||||||
|
|
||||||
|
![line chart 1](images/line1.png)
|
||||||
|
|
||||||
|
✅ Because Seaborn is aggregating data around one line, it displays "the multiple measurements at each x value by plotting the mean and the 95% confidence interval around the mean". [source](https://seaborn.pydata.org/tutorial/relational.html). This time-consuming behavior can be disabled by adding `ci=None`.
|
||||||
|
|
||||||
|
Question: Well, in 2003 can we also see a spike in the honey supply? What if you look at total production year over year?
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.relplot(x="year", y="totalprod", kind="line", data=honey);
|
||||||
|
```
|
||||||
|
|
||||||
|
![line chart 2](images/line2.png)
|
||||||
|
|
||||||
|
Answer: Not really. If you look at total production, it actually seems to have increased in that particular year, even though generally speaking the amount of honey being produced is in decline during these years.
|
||||||
|
|
||||||
|
Question: In that case, what could have caused that spike in the price of honey around 2003?
|
||||||
|
|
||||||
|
To discover this, you can explore a facet grid.
|
||||||
|
|
||||||
|
## Facet grids
|
||||||
|
|
||||||
|
Facet grids take one facet of your dataset (in our case, you can choose 'year' to avoid having too many facets produced). Seaborn can then make a plot for each of those facets of your chosen x and y coordinates for more easy visual comparison. Does 2003 stand out in this type of comparison?
|
||||||
|
|
||||||
|
Create a facet grid by continuing to use `relplot` as recommended by [Seaborn's documentation](https://seaborn.pydata.org/generated/seaborn.FacetGrid.html?highlight=facetgrid#seaborn.FacetGrid).
|
||||||
|
|
||||||
|
```python
|
||||||
|
sns.relplot(
|
||||||
|
data=honey,
|
||||||
|
x="yieldpercol", y="numcol",
|
||||||
|
col="year",
|
||||||
|
col_wrap=3,
|
||||||
|
kind="line"
|
||||||
|
```
|
||||||
|
In this visualization, you can compare the yield per colony and number of colonies year over year, side by side with a wrap set at 3 for the columns:
|
||||||
|
|
||||||
|
![facet grid](images/facet.png)
|
||||||
|
|
||||||
|
For this dataset, nothing particularly stands out with regards to the number of colonies and their yield, year over year and state over state. Is there a different way to look at finding a correlation between these two variables?
|
||||||
|
|
||||||
|
## Dual-line Plots
|
||||||
|
|
||||||
|
Try a multiline plot by superimposing two lineplots on top of each other, using Seaborn's 'despine' to remove their top and right spines, and using `ax.twinx` [derived from Matplotlib](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.twinx.html). Twinx allows a chart to share the x axis and display two y axes. So, display the yield per colony and number of colonies, superimposed:
|
||||||
|
|
||||||
|
```python
|
||||||
|
fig, ax = plt.subplots(figsize=(12,6))
|
||||||
|
lineplot = sns.lineplot(x=honey['year'], y=honey['numcol'], data=honey,
|
||||||
|
label = 'Number of bee colonies', legend=False)
|
||||||
|
sns.despine()
|
||||||
|
plt.ylabel('# colonies')
|
||||||
|
plt.title('Honey Production Year over Year');
|
||||||
|
|
||||||
|
ax2 = ax.twinx()
|
||||||
|
lineplot2 = sns.lineplot(x=honey['year'], y=honey['yieldpercol'], ax=ax2, color="r",
|
||||||
|
label ='Yield per colony', legend=False)
|
||||||
|
sns.despine(right=False)
|
||||||
|
plt.ylabel('colony yield')
|
||||||
|
ax.figure.legend();
|
||||||
|
```
|
||||||
|
![superimposed plots](images/dual-line.png)
|
||||||
|
|
||||||
|
While nothing jumps out to the eye around the year 2003, it does allow us to end this lesson on a little happier note: while there are overall a declining number of colonies, their numbers might seem to be stabilizing and their yield per colony is actually increasing, even with fewer bees.
|
||||||
|
|
||||||
|
Go, bees, go!
|
||||||
|
|
||||||
|
🐝❤️
|
||||||
|
## 🚀 Challenge
|
||||||
|
|
||||||
|
In this lesson, you learned a bit more about other uses of scatterplots and line grids, including facet grids. Challenge yourself to create a facet grid using a different dataset, maybe one you used prior to these lessons. Note how long they take to create and how you need to be careful about how many grids you need to draw using these techniques.
|
||||||
|
## Post-Lecture Quiz
|
||||||
|
|
||||||
|
[Post-lecture quiz]()
|
||||||
|
|
||||||
|
## Review & Self Study
|
||||||
|
|
||||||
|
Line plots can be simple or quite complex. Do a bit of reading in the [Seaborn documentation](https://seaborn.pydata.org/generated/seaborn.lineplot.html) on the various ways you can build them. Try to enhance the line charts you built in this lesson with other methods listed in the docs.
|
||||||
|
## Assignment
|
||||||
|
|
||||||
|
[Dive into the beehive](assignment.md)
|
@ -0,0 +1,11 @@
|
|||||||
|
# Dive into the beehive
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
In this lesson you started looking at a dataset around bees and their honey production over a period of time that saw losses in the bee colony population overall. Dig deeper into this dataset and build a notebook that can tell the story of the health of the bee population, state by state and year by year. Do you discover anything interesting about this dataset?
|
||||||
|
|
||||||
|
## Rubric
|
||||||
|
|
||||||
|
| Exemplary | Adequate | Needs Improvement |
|
||||||
|
| ------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ---------------------------------------- |
|
||||||
|
| A notebook is presented with a story annotated with at least three different charts showing aspects of the dataset, state over state and year over year | The notebook lacks one of these elements | The notebook lacks two of these elements |
|
After Width: | Height: | Size: 54 KiB |
After Width: | Height: | Size: 203 KiB |
After Width: | Height: | Size: 22 KiB |
After Width: | Height: | Size: 25 KiB |
After Width: | Height: | Size: 51 KiB |
After Width: | Height: | Size: 100 KiB |
After Width: | Height: | Size: 106 KiB |
@ -0,0 +1,19 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Visualizing Honey Production 🍯 🐝"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,158 @@
|
|||||||
|
# Making Meaningful Visualizations
|
||||||
|
|
||||||
|
> "If you torture the data long enough, it will confess to anything" -- [Ronald Coase](https://en.wikiquote.org/wiki/Ronald_Coase)
|
||||||
|
|
||||||
|
One of the basic skills of a data scientist is the ability to create a meaningful data visualization that helps answer questions you might have. Prior to visualizing your data, you need to ensure that it has been cleaned and prepared, as you did in prior lessons. After that, you can start deciding how best to present the data.
|
||||||
|
|
||||||
|
In this lesson, you will review:
|
||||||
|
|
||||||
|
1. How to choose the right chart type
|
||||||
|
2. How to avoid deceptive charting
|
||||||
|
3. How to work with color
|
||||||
|
4. How to style your charts for readability
|
||||||
|
5. How to build animated or 3D charting solutions
|
||||||
|
6. How to build a creative visualization
|
||||||
|
|
||||||
|
[Pre-lecture quiz]()
|
||||||
|
## Pre-Lecture Quiz
|
||||||
|
|
||||||
|
## Choose the right chart type
|
||||||
|
|
||||||
|
In previous lessons, you experimented with building all kinds of interesting data visualizations using Matplotlib and Seaborn for charting. In general, you can select the [right kind of chart](https://chartio.com/learn/charts/how-to-select-a-data-vizualization/) for the question you are asking using this table:
|
||||||
|
|
||||||
|
|
||||||
|
| You need to: | You should use: |
|
||||||
|
| -------------------------- | ------------------------------- |
|
||||||
|
| Show data trends over time | Line |
|
||||||
|
| Compare categories | Bar, Pie |
|
||||||
|
| Compare totals | Pie, Stacked Bar |
|
||||||
|
| Show relationships | Scatter, Line, Facet, Dual Line |
|
||||||
|
| Show distributions | Scatter, Histogram, Box |
|
||||||
|
| Show proportions | Pie, Donut, Waffle |
|
||||||
|
|
||||||
|
> ✅ Depending on the makeup of your data, you might need to convert it from text to numeric to get a given chart to support it.
|
||||||
|
## Avoid deception
|
||||||
|
|
||||||
|
Even if a data scientist is careful to choose the right chart for the right data, there are plenty of ways that data can be displayed in a way to prove a point, often at the cost of undermining the data itself. There are many examples of deceptive charts and infographics!
|
||||||
|
|
||||||
|
[![Deceptive Charts by Alberto Cairo](./images/tornado.png)](https://www.youtube.com/Low28hx4wyk "Deceptive charts")
|
||||||
|
|
||||||
|
> 🎥 Click the image above for a conference talk about deceptive charts
|
||||||
|
|
||||||
|
This chart reverses the X axis to show the opposite of the truth, based on date:
|
||||||
|
|
||||||
|
![bad chart 1](images/bad-chart-1.png)
|
||||||
|
|
||||||
|
[This chart](https://media.firstcoastnews.com/assets/WTLV/images/170ae16f-4643-438f-b689-50d66ca6a8d8/170ae16f-4643-438f-b689-50d66ca6a8d8_1140x641.jpg) is even more deceptive, as the eye is drawn to the right to conclude that, over time, COVID cases have declined in the various counties. In fact, if you look closely at the dates, you find that they have been rearranged to give that deceptive downward trend.
|
||||||
|
|
||||||
|
![bad chart 2](images/bad-chart-2.jpg)
|
||||||
|
|
||||||
|
This notorious example uses color AND a flipped Y axis to deceive: instead of concluding that gun deaths spiked after the passage of gun-friendly legislation, in fact the eye is fooled to think that the opposite is true:
|
||||||
|
|
||||||
|
![bad chart 3](images/bad-chart-3.jpg)
|
||||||
|
|
||||||
|
This strange chart shows how proportion can be manipulated, to hilarious effect:
|
||||||
|
|
||||||
|
![bad chart 4](images/bad-chart-4.jpg)
|
||||||
|
|
||||||
|
Comparing the incomparable is yet another shady trick. There is a [wonderful web site](https://tylervigen.com/spurious-correlations) all about 'spurious correlations' displaying 'facts' correlating things like the divorce rate in Maine and the consumption of margarine. A Reddit group also collects the [ugly uses](https://www.reddit.com/r/dataisugly/top/?t=all) of data.
|
||||||
|
|
||||||
|
It's important to understand how easily the eye can be fooled by deceptive charts. Even if the data scientist's intention is good, the choice of a bad type of chart, such as a pie chart showing too many categories, can be deceptive.
|
||||||
|
|
||||||
|
## Color
|
||||||
|
|
||||||
|
You saw in the 'Florida gun violence' chart above how color can provide an additional layer of meaning to charts, especially ones not designed using libraries such as Matplotlib and Seaborn which come with various vetted color libraries and palettes. If you are making a chart by hand, do a little study of [color theory](https://colormatters.com/color-and-design/basic-color-theory)
|
||||||
|
|
||||||
|
> ✅ Be aware, when designing charts, that accessibility is an important aspect of visualization. Some of your users might be color blind - does your chart display well for users with visual impairments?
|
||||||
|
|
||||||
|
Be careful when choosing colors for your chart, as color can convey meaning you might not intend. The 'pink ladies' in the 'height' chart above convey a distinctly 'feminine' ascribed meaning that adds to the bizarreness of the chart itself.
|
||||||
|
|
||||||
|
While [color meaning](https://colormatters.com/color-symbolism/the-meanings-of-colors) might be different in different parts of the world, and tend to change in meaning according to their shade. Generally speaking, color meanings include:
|
||||||
|
|
||||||
|
| Color | Meaning |
|
||||||
|
| ------ | ------------------- |
|
||||||
|
| red | power |
|
||||||
|
| blue | trust, loyalty |
|
||||||
|
| yellow | happiness, caution |
|
||||||
|
| green | ecology, luck, envy |
|
||||||
|
| purple | happiness |
|
||||||
|
| orange | vibrance |
|
||||||
|
|
||||||
|
If you are tasked with building a chart with custom colors, ensure that your charts are both accessible and the color you choose coincides with the meaning you are trying to convey.
|
||||||
|
## Styling your charts for readability
|
||||||
|
|
||||||
|
Charts are not meaningful if they are not readable! Take a moment to consider styling the width and height of your chart to scale well with your data. If one variable (such as all 50 states) need to be displayed, show them vertically on the Y axis if possible so as to avoid a horizontally-scrolling chart.
|
||||||
|
|
||||||
|
Label your axes, provide a legend if necessary, and offer tooltips for better comprehension of data.
|
||||||
|
|
||||||
|
If your data is textual and verbose on the X-axis, you can angle the text for better readability. [Matplotlib](https://matplotlib.org/stable/tutorials/toolkits/mplot3d.html) offers 3d plotting, if you data supports it. Sophisticated data visualizations can be produced using `mpl_toolkits.mplot3d`.
|
||||||
|
|
||||||
|
![3d plots](images/3d.png)
|
||||||
|
## Animation and 3D chart display
|
||||||
|
|
||||||
|
Some of the best data visualizations today are animated. Shirley Wu has amazing ones done with D3, such as '[film flowers](http://bl.ocks.org/sxywu/raw/d612c6c653fb8b4d7ff3d422be164a5d/)', where each flower is a visualization of a movie. Another example for the Guardian is 'bussed out', an interactive experience combining visualizations with Greensock and D3 plus a scrollytelling article format to show how NYC handles its homeless problem by busing people out of the city.
|
||||||
|
|
||||||
|
![busing](images/busing.png)
|
||||||
|
|
||||||
|
While this lesson is insufficient to go into depth to teach these powerful visualization libraries, try your hand at D3 in a Vue.js app using a library to display a visualization of the book "Dangerous Liaisons" as an animated social network.
|
||||||
|
|
||||||
|
> "Les Liaisons Dangereuses" is an epistolary novel, or a novel presented as a series of letters. Written in 1782 by Choderlos de Laclos, it tells the story of the vicious, morally-bankrupt social maneuvers of two dueling protagonists of the French aristocracy in the late 18th century, the Vicomte de Valmont and the Marquise de Merteuil. Both meet their demise in the end but not without inflicting a great deal of social damage. The novel unfolds as a series of letters written to various people in their circles, plotting for revenge or simply to make trouble. Create a visualization of these letters to discover the major kingpins of the narrative, visually.
|
||||||
|
|
||||||
|
You will complete a web app that will display an animated view of this social network. It uses a library that was built to create a [visual of a network](https://github.com/emiliorizzo/vue-d3-network) using Vue.js and D3. When the app is running, you can pull the nodes around on the screen to shuffle the data around.
|
||||||
|
|
||||||
|
![liaisons](images/liaisons.png)
|
||||||
|
## Project: Build a chart to show a network using D3.js
|
||||||
|
|
||||||
|
> This lesson folder includes a `solution` folder where you can find the completed project, for your reference.
|
||||||
|
|
||||||
|
1. Follow the instructions in the README.md file in the starter folder's root. Make sure you have NPM and Node.js running on your machine before installing your project's dependencies.
|
||||||
|
|
||||||
|
2. Open the `starter/src` folder. You'll discover an `assets` folder where you can find a .json file with all the letters from the novel, numbered, with a 'to' and 'from' annotation.
|
||||||
|
|
||||||
|
3. Complete the code in `components/Nodes.vue` to enable the visualization. Look for the method called `createLinks()` and add the following nested loop.
|
||||||
|
|
||||||
|
Loop through the .json object to capture the 'to' and 'from' data for the letters and build up the `links` object so that the visualization library can consume it:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
//loop through letters
|
||||||
|
let f = 0;
|
||||||
|
let t = 0;
|
||||||
|
for (var i = 0; i < letters.length; i++) {
|
||||||
|
for (var j = 0; j < characters.length; j++) {
|
||||||
|
|
||||||
|
if (characters[j] == letters[i].from) {
|
||||||
|
f = j;
|
||||||
|
}
|
||||||
|
if (characters[j] == letters[i].to) {
|
||||||
|
t = j;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
this.links.push({ sid: f, tid: t });
|
||||||
|
}
|
||||||
|
```
|
||||||
|
Run your app from the terminal (npm run serve) and enjoy the visualization!
|
||||||
|
## 🚀 Challenge
|
||||||
|
|
||||||
|
Take a tour of the internet to discover deceptive visualizations. How does the author fool the user, and is it intentional? Try correcting the visualizations to show how they should look.
|
||||||
|
## Post-Lecture Quiz
|
||||||
|
|
||||||
|
[Post-lecture quiz]()
|
||||||
|
|
||||||
|
## Review & Self Study
|
||||||
|
|
||||||
|
Here are some articles to read about deceptive data visualization:
|
||||||
|
|
||||||
|
https://gizmodo.com/how-to-lie-with-data-visualization-1563576606
|
||||||
|
|
||||||
|
http://ixd.prattsi.org/2017/12/visual-lies-usability-in-deceptive-data-visualizations/
|
||||||
|
|
||||||
|
Take a look at these interest visualizations for historical assets and artifacts:
|
||||||
|
|
||||||
|
https://handbook.pubpub.org/
|
||||||
|
|
||||||
|
Look through this article on how animation can enhance your visualizations
|
||||||
|
|
||||||
|
https://medium.com/@EvanSinar/use-animation-to-supercharge-data-visualization-cd905a882ad4
|
||||||
|
## Assignment
|
||||||
|
|
||||||
|
[Build your own custom vis](assignment.md)
|
@ -0,0 +1,10 @@
|
|||||||
|
# Build your own custom vis
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
Using the code sample in this project to create a social network, mock up data of your own social interactions. You could map your usage of social media or make a diagram of your family members. Create an interesting web app that shows a unique visualization of a social network.
|
||||||
|
## Rubric
|
||||||
|
|
||||||
|
Exemplary | Adequate | Needs Improvement
|
||||||
|
--- | --- | -- |
|
||||||
|
A GitHub repo is presented with code that runs properly (try deploying it as a static web app) and has an annotated README explaining the project | The repo does not run properly or is not documented well | The repo does not run properly and is not documented well
|
After Width: | Height: | Size: 88 KiB |
After Width: | Height: | Size: 440 KiB |
After Width: | Height: | Size: 83 KiB |
After Width: | Height: | Size: 44 KiB |
After Width: | Height: | Size: 126 KiB |
After Width: | Height: | Size: 404 KiB |
After Width: | Height: | Size: 252 KiB |
After Width: | Height: | Size: 273 KiB |
@ -0,0 +1,23 @@
|
|||||||
|
.DS_Store
|
||||||
|
node_modules
|
||||||
|
/dist
|
||||||
|
|
||||||
|
|
||||||
|
# local env files
|
||||||
|
.env.local
|
||||||
|
.env.*.local
|
||||||
|
|
||||||
|
# Log files
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
pnpm-debug.log*
|
||||||
|
|
||||||
|
# Editor directories and files
|
||||||
|
.idea
|
||||||
|
.vscode
|
||||||
|
*.suo
|
||||||
|
*.ntvs*
|
||||||
|
*.njsproj
|
||||||
|
*.sln
|
||||||
|
*.sw?
|
@ -0,0 +1,26 @@
|
|||||||
|
# Dangerous Liaisons data visualization project
|
||||||
|
|
||||||
|
To get started, you need to ensure that you have NPM and Node running on your machine. Install the dependencies (npm install) and then run the project locally (npm run serve):
|
||||||
|
|
||||||
|
## Project setup
|
||||||
|
```
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compiles and hot-reloads for development
|
||||||
|
```
|
||||||
|
npm run serve
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compiles and minifies for production
|
||||||
|
```
|
||||||
|
npm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Lints and fixes files
|
||||||
|
```
|
||||||
|
npm run lint
|
||||||
|
```
|
||||||
|
|
||||||
|
### Customize configuration
|
||||||
|
See [Configuration Reference](https://cli.vuejs.org/config/).
|
@ -0,0 +1,5 @@
|
|||||||
|
module.exports = {
|
||||||
|
presets: [
|
||||||
|
'@vue/cli-plugin-babel/preset'
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,43 @@
|
|||||||
|
{
|
||||||
|
"name": "liaisons",
|
||||||
|
"version": "0.1.0",
|
||||||
|
"private": true,
|
||||||
|
"scripts": {
|
||||||
|
"serve": "vue-cli-service serve",
|
||||||
|
"build": "vue-cli-service build",
|
||||||
|
"lint": "vue-cli-service lint"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"core-js": "^3.6.5",
|
||||||
|
"vue": "^2.6.11",
|
||||||
|
"vue-d3-network": "0.1.28"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@vue/cli-plugin-babel": "~4.5.0",
|
||||||
|
"@vue/cli-plugin-eslint": "~4.5.0",
|
||||||
|
"@vue/cli-service": "~4.5.0",
|
||||||
|
"babel-eslint": "^10.1.0",
|
||||||
|
"eslint": "^6.7.2",
|
||||||
|
"eslint-plugin-vue": "^6.2.2",
|
||||||
|
"vue-template-compiler": "^2.6.11"
|
||||||
|
},
|
||||||
|
"eslintConfig": {
|
||||||
|
"root": true,
|
||||||
|
"env": {
|
||||||
|
"node": true
|
||||||
|
},
|
||||||
|
"extends": [
|
||||||
|
"plugin:vue/essential",
|
||||||
|
"eslint:recommended"
|
||||||
|
],
|
||||||
|
"parserOptions": {
|
||||||
|
"parser": "babel-eslint"
|
||||||
|
},
|
||||||
|
"rules": {}
|
||||||
|
},
|
||||||
|
"browserslist": [
|
||||||
|
"> 1%",
|
||||||
|
"last 2 versions",
|
||||||
|
"not dead"
|
||||||
|
]
|
||||||
|
}
|
After Width: | Height: | Size: 6.5 KiB |
@ -0,0 +1,20 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
|
||||||
|
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||||
|
<link rel="icon" href="<%= BASE_URL %>favicon.ico" />
|
||||||
|
<title>Les Liaisons Dangereuses: Visualization</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<noscript>
|
||||||
|
<strong
|
||||||
|
>We're sorry but this site doesn't work properly without JavaScript
|
||||||
|
enabled. Please enable it to continue.</strong
|
||||||
|
>
|
||||||
|
</noscript>
|
||||||
|
<div id="app"></div>
|
||||||
|
<!-- built files will be auto injected -->
|
||||||
|
</body>
|
||||||
|
</html>
|
@ -0,0 +1,17 @@
|
|||||||
|
<template>
|
||||||
|
<div id="app">
|
||||||
|
<Nodes />
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
import Nodes from "./components/Nodes";
|
||||||
|
|
||||||
|
export default {
|
||||||
|
name: "App",
|
||||||
|
components: {
|
||||||
|
Nodes,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
|
@ -0,0 +1,193 @@
|
|||||||
|
<template>
|
||||||
|
<div class="hello">
|
||||||
|
<d3-network :net-nodes="nodes" :net-links="links" :options="options" />
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
import D3Network from "vue-d3-network";
|
||||||
|
import letters from "../assets/letters.json";
|
||||||
|
|
||||||
|
export default {
|
||||||
|
name: "Nodes",
|
||||||
|
components: {
|
||||||
|
D3Network,
|
||||||
|
},
|
||||||
|
data() {
|
||||||
|
return {
|
||||||
|
nodes: [],
|
||||||
|
links: [
|
||||||
|
/*{ sid: 1, tid: 2 },
|
||||||
|
*/
|
||||||
|
],
|
||||||
|
nodeSize: 10,
|
||||||
|
canvas: false,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
methods: {
|
||||||
|
createLinks(characters) {
|
||||||
|
//loop through letters
|
||||||
|
let f = 0;
|
||||||
|
let t = 0;
|
||||||
|
for (var i = 0; i < letters.length; i++) {
|
||||||
|
for (var j = 0; j < characters.length; j++) {
|
||||||
|
|
||||||
|
if (characters[j] == letters[i].from) {
|
||||||
|
f = j;
|
||||||
|
}
|
||||||
|
if (characters[j] == letters[i].to) {
|
||||||
|
t = j;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
this.links.push({ sid: f, tid: t });
|
||||||
|
}
|
||||||
|
|
||||||
|
},
|
||||||
|
getCount(name) {
|
||||||
|
var count = 0;
|
||||||
|
for (var i = 0; i < letters.length; i++) {
|
||||||
|
if (letters[i].to === name) {
|
||||||
|
count++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return count;
|
||||||
|
},
|
||||||
|
},
|
||||||
|
computed: {
|
||||||
|
options() {
|
||||||
|
return {
|
||||||
|
force: 3000,
|
||||||
|
size: { w: 600, h: 600 },
|
||||||
|
offset: {
|
||||||
|
x: 0,
|
||||||
|
y: 0,
|
||||||
|
},
|
||||||
|
nodeLabels: true,
|
||||||
|
linkLabels: true,
|
||||||
|
canvas: this.canvas,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
|
created() {
|
||||||
|
let characters = [
|
||||||
|
"Chevalier Danceny",
|
||||||
|
"Marquise de Merteuil",
|
||||||
|
"Cécile Volanges",
|
||||||
|
"Présidente de Tourvel",
|
||||||
|
"Azolan, chasseur",
|
||||||
|
"Madame de Rosemonde",
|
||||||
|
"Madame de Volanges",
|
||||||
|
"Vicomte de Valmont",
|
||||||
|
"Père Anselme",
|
||||||
|
"...",
|
||||||
|
"M. Bertrand",
|
||||||
|
"Anonyme",
|
||||||
|
"Sophie Carnay",
|
||||||
|
"Maréchale de ***",
|
||||||
|
"Le Comte de Gercourt",
|
||||||
|
];
|
||||||
|
|
||||||
|
this.createLinks(characters);
|
||||||
|
|
||||||
|
for (var j = 0; j < characters.length; j++) {
|
||||||
|
this.nodes.push({
|
||||||
|
id: j,
|
||||||
|
name: characters[j],
|
||||||
|
_size: this.getCount(characters[j]) + 20,
|
||||||
|
_color: "#" + Math.floor(Math.random() * 16777215).toString(16),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
},
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<style>
|
||||||
|
@import url("https://fonts.googleapis.com/css?family=PT+Sans");
|
||||||
|
canvas {
|
||||||
|
left: 0;
|
||||||
|
position: absolute;
|
||||||
|
top: 0;
|
||||||
|
}
|
||||||
|
.net {
|
||||||
|
height: 100%;
|
||||||
|
margin: 0;
|
||||||
|
}
|
||||||
|
.node {
|
||||||
|
-webkit-transition: fill 0.5s ease;
|
||||||
|
fill: #dcfaf3;
|
||||||
|
transition: fill 0.5s ease;
|
||||||
|
}
|
||||||
|
.node.selected {
|
||||||
|
stroke: #caa455;
|
||||||
|
}
|
||||||
|
.node.pinned {
|
||||||
|
stroke: rgba(106, 37, 185, 0.6);
|
||||||
|
}
|
||||||
|
.link {
|
||||||
|
stroke: rgba(18, 120, 98, 0.3);
|
||||||
|
}
|
||||||
|
.link,
|
||||||
|
.node {
|
||||||
|
stroke-linecap: round;
|
||||||
|
}
|
||||||
|
.link:hover,
|
||||||
|
.node:hover {
|
||||||
|
stroke: rgba(250, 197, 65, 0.6);
|
||||||
|
stroke-width: 5px;
|
||||||
|
}
|
||||||
|
.link.selected {
|
||||||
|
stroke: rgba(34, 30, 20, 0.6);
|
||||||
|
}
|
||||||
|
.curve {
|
||||||
|
fill: none;
|
||||||
|
}
|
||||||
|
.link-label,
|
||||||
|
.node-label {
|
||||||
|
fill: black;
|
||||||
|
}
|
||||||
|
.link-label {
|
||||||
|
-webkit-transform: translateY(-0.5em);
|
||||||
|
text-anchor: middle;
|
||||||
|
transform: translateY(-0.5em);
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
overflow-x: hidden;
|
||||||
|
}
|
||||||
|
|
||||||
|
body,
|
||||||
|
html {
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
body {
|
||||||
|
background-color: #fff;
|
||||||
|
font-family: "PT Sans";
|
||||||
|
}
|
||||||
|
|
||||||
|
#app {
|
||||||
|
bottom: 0;
|
||||||
|
left: 0;
|
||||||
|
max-width: 100%;
|
||||||
|
position: absolute;
|
||||||
|
top: 0;
|
||||||
|
width: 100%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.links {
|
||||||
|
list-style: none;
|
||||||
|
margin: 1em 5em 0 0;
|
||||||
|
position: absolute;
|
||||||
|
right: 0;
|
||||||
|
top: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
#app {
|
||||||
|
-moz-user-select: none;
|
||||||
|
-ms-user-select: none;
|
||||||
|
-webkit-user-select: none;
|
||||||
|
text-align: center;
|
||||||
|
user-select: none;
|
||||||
|
}
|
||||||
|
</style>
|
@ -0,0 +1,8 @@
|
|||||||
|
import Vue from "vue";
|
||||||
|
import App from "./App.vue";
|
||||||
|
|
||||||
|
Vue.config.productionTip = false;
|
||||||
|
|
||||||
|
new Vue({
|
||||||
|
render: h => h(App)
|
||||||
|
}).$mount("#app");
|
@ -0,0 +1,23 @@
|
|||||||
|
.DS_Store
|
||||||
|
node_modules
|
||||||
|
/dist
|
||||||
|
|
||||||
|
|
||||||
|
# local env files
|
||||||
|
.env.local
|
||||||
|
.env.*.local
|
||||||
|
|
||||||
|
# Log files
|
||||||
|
npm-debug.log*
|
||||||
|
yarn-debug.log*
|
||||||
|
yarn-error.log*
|
||||||
|
pnpm-debug.log*
|
||||||
|
|
||||||
|
# Editor directories and files
|
||||||
|
.idea
|
||||||
|
.vscode
|
||||||
|
*.suo
|
||||||
|
*.ntvs*
|
||||||
|
*.njsproj
|
||||||
|
*.sln
|
||||||
|
*.sw?
|
@ -0,0 +1,26 @@
|
|||||||
|
# Dangerous Liaisons data visualization project
|
||||||
|
|
||||||
|
To get started, you need to ensure that you have NPM and Node running on your machine. Install the dependencies (npm install) and then run the project locally (npm run serve):
|
||||||
|
|
||||||
|
## Project setup
|
||||||
|
```
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compiles and hot-reloads for development
|
||||||
|
```
|
||||||
|
npm run serve
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compiles and minifies for production
|
||||||
|
```
|
||||||
|
npm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Lints and fixes files
|
||||||
|
```
|
||||||
|
npm run lint
|
||||||
|
```
|
||||||
|
|
||||||
|
### Customize configuration
|
||||||
|
See [Configuration Reference](https://cli.vuejs.org/config/).
|
@ -0,0 +1,5 @@
|
|||||||
|
module.exports = {
|
||||||
|
presets: [
|
||||||
|
'@vue/cli-plugin-babel/preset'
|
||||||
|
]
|
||||||
|
}
|
@ -0,0 +1,43 @@
|
|||||||
|
{
|
||||||
|
"name": "liaisons",
|
||||||
|
"version": "0.1.0",
|
||||||
|
"private": true,
|
||||||
|
"scripts": {
|
||||||
|
"serve": "vue-cli-service serve",
|
||||||
|
"build": "vue-cli-service build",
|
||||||
|
"lint": "vue-cli-service lint"
|
||||||
|
},
|
||||||
|
"dependencies": {
|
||||||
|
"core-js": "^3.6.5",
|
||||||
|
"vue": "^2.6.11",
|
||||||
|
"vue-d3-network": "0.1.28"
|
||||||
|
},
|
||||||
|
"devDependencies": {
|
||||||
|
"@vue/cli-plugin-babel": "~4.5.0",
|
||||||
|
"@vue/cli-plugin-eslint": "~4.5.0",
|
||||||
|
"@vue/cli-service": "~4.5.0",
|
||||||
|
"babel-eslint": "^10.1.0",
|
||||||
|
"eslint": "^6.7.2",
|
||||||
|
"eslint-plugin-vue": "^6.2.2",
|
||||||
|
"vue-template-compiler": "^2.6.11"
|
||||||
|
},
|
||||||
|
"eslintConfig": {
|
||||||
|
"root": true,
|
||||||
|
"env": {
|
||||||
|
"node": true
|
||||||
|
},
|
||||||
|
"extends": [
|
||||||
|
"plugin:vue/essential",
|
||||||
|
"eslint:recommended"
|
||||||
|
],
|
||||||
|
"parserOptions": {
|
||||||
|
"parser": "babel-eslint"
|
||||||
|
},
|
||||||
|
"rules": {}
|
||||||
|
},
|
||||||
|
"browserslist": [
|
||||||
|
"> 1%",
|
||||||
|
"last 2 versions",
|
||||||
|
"not dead"
|
||||||
|
]
|
||||||
|
}
|
After Width: | Height: | Size: 6.5 KiB |
@ -0,0 +1,20 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
|
||||||
|
<meta name="viewport" content="width=device-width,initial-scale=1.0" />
|
||||||
|
<link rel="icon" href="<%= BASE_URL %>favicon.ico" />
|
||||||
|
<title>Les Liaisons Dangereuses: Visualization</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<noscript>
|
||||||
|
<strong
|
||||||
|
>We're sorry but this site doesn't work properly without JavaScript
|
||||||
|
enabled. Please enable it to continue.</strong
|
||||||
|
>
|
||||||
|
</noscript>
|
||||||
|
<div id="app"></div>
|
||||||
|
<!-- built files will be auto injected -->
|
||||||
|
</body>
|
||||||
|
</html>
|
@ -0,0 +1,17 @@
|
|||||||
|
<template>
|
||||||
|
<div id="app">
|
||||||
|
<Nodes />
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
import Nodes from "./components/Nodes";
|
||||||
|
|
||||||
|
export default {
|
||||||
|
name: "App",
|
||||||
|
components: {
|
||||||
|
Nodes,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
|
@ -0,0 +1,178 @@
|
|||||||
|
<template>
|
||||||
|
<div class="hello">
|
||||||
|
<d3-network :net-nodes="nodes" :net-links="links" :options="options" />
|
||||||
|
</div>
|
||||||
|
</template>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
import D3Network from "vue-d3-network";
|
||||||
|
import letters from "../assets/letters.json";
|
||||||
|
|
||||||
|
export default {
|
||||||
|
name: "Nodes",
|
||||||
|
components: {
|
||||||
|
D3Network,
|
||||||
|
},
|
||||||
|
data() {
|
||||||
|
return {
|
||||||
|
nodes: [],
|
||||||
|
links: [
|
||||||
|
/*{ sid: 1, tid: 2 },
|
||||||
|
*/
|
||||||
|
],
|
||||||
|
nodeSize: 10,
|
||||||
|
canvas: false,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
methods: {
|
||||||
|
createLinks(characters) {
|
||||||
|
//complete this code
|
||||||
|
},
|
||||||
|
getCount(name) {
|
||||||
|
var count = 0;
|
||||||
|
for (var i = 0; i < letters.length; i++) {
|
||||||
|
if (letters[i].to === name) {
|
||||||
|
count++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return count;
|
||||||
|
},
|
||||||
|
},
|
||||||
|
computed: {
|
||||||
|
options() {
|
||||||
|
return {
|
||||||
|
force: 3000,
|
||||||
|
size: { w: 600, h: 600 },
|
||||||
|
offset: {
|
||||||
|
x: 0,
|
||||||
|
y: 0,
|
||||||
|
},
|
||||||
|
nodeLabels: true,
|
||||||
|
linkLabels: true,
|
||||||
|
canvas: this.canvas,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
|
created() {
|
||||||
|
let characters = [
|
||||||
|
"Chevalier Danceny",
|
||||||
|
"Marquise de Merteuil",
|
||||||
|
"Cécile Volanges",
|
||||||
|
"Présidente de Tourvel",
|
||||||
|
"Azolan, chasseur",
|
||||||
|
"Madame de Rosemonde",
|
||||||
|
"Madame de Volanges",
|
||||||
|
"Vicomte de Valmont",
|
||||||
|
"Père Anselme",
|
||||||
|
"...",
|
||||||
|
"M. Bertrand",
|
||||||
|
"Anonyme",
|
||||||
|
"Sophie Carnay",
|
||||||
|
"Maréchale de ***",
|
||||||
|
"Le Comte de Gercourt",
|
||||||
|
];
|
||||||
|
|
||||||
|
this.createLinks(characters);
|
||||||
|
|
||||||
|
for (var j = 0; j < characters.length; j++) {
|
||||||
|
this.nodes.push({
|
||||||
|
id: j,
|
||||||
|
name: characters[j],
|
||||||
|
_size: this.getCount(characters[j]) + 20,
|
||||||
|
_color: "#" + Math.floor(Math.random() * 16777215).toString(16),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
},
|
||||||
|
};
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<style>
|
||||||
|
@import url("https://fonts.googleapis.com/css?family=PT+Sans");
|
||||||
|
canvas {
|
||||||
|
left: 0;
|
||||||
|
position: absolute;
|
||||||
|
top: 0;
|
||||||
|
}
|
||||||
|
.net {
|
||||||
|
height: 100%;
|
||||||
|
margin: 0;
|
||||||
|
}
|
||||||
|
.node {
|
||||||
|
-webkit-transition: fill 0.5s ease;
|
||||||
|
fill: #dcfaf3;
|
||||||
|
transition: fill 0.5s ease;
|
||||||
|
}
|
||||||
|
.node.selected {
|
||||||
|
stroke: #caa455;
|
||||||
|
}
|
||||||
|
.node.pinned {
|
||||||
|
stroke: rgba(106, 37, 185, 0.6);
|
||||||
|
}
|
||||||
|
.link {
|
||||||
|
stroke: rgba(18, 120, 98, 0.3);
|
||||||
|
}
|
||||||
|
.link,
|
||||||
|
.node {
|
||||||
|
stroke-linecap: round;
|
||||||
|
}
|
||||||
|
.link:hover,
|
||||||
|
.node:hover {
|
||||||
|
stroke: rgba(250, 197, 65, 0.6);
|
||||||
|
stroke-width: 5px;
|
||||||
|
}
|
||||||
|
.link.selected {
|
||||||
|
stroke: rgba(34, 30, 20, 0.6);
|
||||||
|
}
|
||||||
|
.curve {
|
||||||
|
fill: none;
|
||||||
|
}
|
||||||
|
.link-label,
|
||||||
|
.node-label {
|
||||||
|
fill: black;
|
||||||
|
}
|
||||||
|
.link-label {
|
||||||
|
-webkit-transform: translateY(-0.5em);
|
||||||
|
text-anchor: middle;
|
||||||
|
transform: translateY(-0.5em);
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
overflow-x: hidden;
|
||||||
|
}
|
||||||
|
|
||||||
|
body,
|
||||||
|
html {
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
body {
|
||||||
|
background-color: #fff;
|
||||||
|
font-family: "PT Sans";
|
||||||
|
}
|
||||||
|
|
||||||
|
#app {
|
||||||
|
bottom: 0;
|
||||||
|
left: 0;
|
||||||
|
max-width: 100%;
|
||||||
|
position: absolute;
|
||||||
|
top: 0;
|
||||||
|
width: 100%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.links {
|
||||||
|
list-style: none;
|
||||||
|
margin: 1em 5em 0 0;
|
||||||
|
position: absolute;
|
||||||
|
right: 0;
|
||||||
|
top: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
#app {
|
||||||
|
-moz-user-select: none;
|
||||||
|
-ms-user-select: none;
|
||||||
|
-webkit-user-select: none;
|
||||||
|
text-align: center;
|
||||||
|
user-select: none;
|
||||||
|
}
|
||||||
|
</style>
|
@ -0,0 +1,8 @@
|
|||||||
|
import Vue from "vue";
|
||||||
|
import App from "./App.vue";
|
||||||
|
|
||||||
|
Vue.config.productionTip = false;
|
||||||
|
|
||||||
|
new Vue({
|
||||||
|
render: h => h(App)
|
||||||
|
}).$mount("#app");
|