11 KiB
Visualizing Proportions
![]() |
---|
Visualizing Proportions - Sketchnote by @nitya |
In this lesson, you'll work with a nature-focused dataset to visualize proportions, such as the distribution of different types of fungi in a dataset about mushrooms. We'll dive into these fascinating fungi using a dataset from Audubon that provides details about 23 species of gilled mushrooms in the Agaricus and Lepiota families. You'll experiment with fun visualizations like:
- Pie charts 🥧
- Donut charts 🍩
- Waffle charts 🧇
💡 Microsoft Research has an interesting project called Charticulator, which offers a free drag-and-drop interface for creating data visualizations. One of their tutorials uses this mushroom dataset! You can explore the data and learn the library simultaneously: Charticulator tutorial.
Pre-lecture quiz
Get to know your mushrooms 🍄
Mushrooms are fascinating organisms. Let's import a dataset to study them:
import pandas as pd
import matplotlib.pyplot as plt
mushrooms = pd.read_csv('../../data/mushrooms.csv')
mushrooms.head()
A table is displayed with some great data for analysis:
class | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | stalk-root | stalk-surface-above-ring | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Poisonous | Convex | Smooth | Brown | Bruises | Pungent | Free | Close | Narrow | Black | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
Edible | Convex | Smooth | Yellow | Bruises | Almond | Free | Close | Broad | Black | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Grasses |
Edible | Bell | Smooth | White | Bruises | Anise | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Meadows |
Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
You'll notice that all the data is textual. To use it in a chart, you'll need to convert it. Most of the data is represented as an object:
print(mushrooms.select_dtypes(["object"]).columns)
The output is:
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
'stalk-surface-below-ring', 'stalk-color-above-ring',
'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
'ring-type', 'spore-print-color', 'population', 'habitat'],
dtype='object')
Convert the 'class' column into a category:
cols = mushrooms.select_dtypes(["object"]).columns
mushrooms[cols] = mushrooms[cols].astype('category')
edibleclass=mushrooms.groupby(['class']).count()
edibleclass
Now, if you print the mushrooms data, you'll see it grouped into categories based on the poisonous/edible class:
cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
class | |||||||||||||||||||||
Edible | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | ... | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 |
Poisonous | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | ... | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 |
Using the order in this table to create your class category labels, you can build a pie chart:
Pie!
labels=['Edible','Poisonous']
plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%')
plt.title('Edible?')
plt.show()
And voilà, a pie chart showing the proportions of the two mushroom classes. It's crucial to get the label order correct, so double-check the array when building the labels!
Donuts!
A donut chart is a visually appealing variation of a pie chart, with a hole in the center. Let's use this method to explore the habitats where mushrooms grow:
habitat=mushrooms.groupby(['habitat']).count()
habitat
Group the data by habitat. There are seven listed habitats, so use them as labels for your donut chart:
labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood']
plt.pie(habitat['class'], labels=labels,
autopct='%1.1f%%', pctdistance=0.85)
center_circle = plt.Circle((0, 0), 0.40, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)
plt.title('Mushroom Habitats')
plt.show()
This code draws the chart and a center circle, then adds the circle to the chart. You can adjust the width of the center circle by changing 0.40
to another value.
Donut charts can be customized in various ways, especially the labels for better readability. Learn more in the docs.
Now that you know how to group data and display it as a pie or donut chart, let's explore another type of chart: the waffle chart.
Waffles!
A waffle chart visualizes quantities as a 2D array of squares. Let's use it to examine the proportions of mushroom cap colors in the dataset. First, install the helper library PyWaffle and use Matplotlib:
pip install pywaffle
Select a segment of your data to group:
capcolor=mushrooms.groupby(['cap-color']).count()
capcolor
Create a waffle chart by defining labels and grouping your data:
import pandas as pd
import matplotlib.pyplot as plt
from pywaffle import Waffle
data ={'color': ['brown', 'buff', 'cinnamon', 'green', 'pink', 'purple', 'red', 'white', 'yellow'],
'amount': capcolor['class']
}
df = pd.DataFrame(data)
fig = plt.figure(
FigureClass = Waffle,
rows = 100,
values = df.amount,
labels = list(df.color),
figsize = (30,30),
colors=["brown", "tan", "maroon", "green", "pink", "purple", "red", "whitesmoke", "yellow"],
)
The waffle chart clearly shows the proportions of cap colors in the mushroom dataset. Interestingly, there are many green-capped mushrooms!
✅ PyWaffle supports icons within the charts, using any icon available in Font Awesome. Experiment with icons to create even more engaging waffle charts.
In this lesson, you learned three ways to visualize proportions. First, group your data into categories, then choose the best visualization method—pie, donut, or waffle. Each offers a quick and intuitive snapshot of the dataset.
🚀 Challenge
Try recreating these charts in Charticulator.
Post-lecture quiz
Review & Self Study
Choosing between pie, donut, or waffle charts isn't always straightforward. Here are some articles to help you decide:
https://www.beautiful.ai/blog/battle-of-the-charts-pie-chart-vs-donut-chart
https://medium.com/@hypsypops/pie-chart-vs-donut-chart-showdown-in-the-ring-5d24fd86a9ce
https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c6.htm
https://medium.datadriveninvestor.com/data-visualization-done-the-right-way-with-tableau-waffle-chart-fdf2a19be402
Do some research to learn more about this decision-making process.
Assignment
Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.