You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

200 lines
11 KiB

<!--
CO_OP_TRANSLATOR_METADATA:
{
"original_hash": "af6a12015c6e250e500b570a9fa42593",
"translation_date": "2025-08-31T11:05:02+00:00",
"source_file": "3-Data-Visualization/11-visualization-proportions/README.md",
"language_code": "en"
}
-->
# Visualizing Proportions
|![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](../../sketchnotes/11-Visualizing-Proportions.png)|
|:---:|
|Visualizing Proportions - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
In this lesson, you'll work with a nature-focused dataset to visualize proportions, such as the distribution of different types of fungi in a dataset about mushrooms. We'll dive into these fascinating fungi using a dataset from Audubon that provides details about 23 species of gilled mushrooms in the Agaricus and Lepiota families. You'll experiment with fun visualizations like:
- Pie charts 🥧
- Donut charts 🍩
- Waffle charts 🧇
> 💡 Microsoft Research has an interesting project called [Charticulator](https://charticulator.com), which offers a free drag-and-drop interface for creating data visualizations. One of their tutorials uses this mushroom dataset! You can explore the data and learn the library simultaneously: [Charticulator tutorial](https://charticulator.com/tutorials/tutorial4.html).
## [Pre-lecture quiz](https://purple-hill-04aebfb03.1.azurestaticapps.net/quiz/20)
## Get to know your mushrooms 🍄
Mushrooms are fascinating organisms. Let's import a dataset to study them:
```python
import pandas as pd
import matplotlib.pyplot as plt
mushrooms = pd.read_csv('../../data/mushrooms.csv')
mushrooms.head()
```
A table is displayed with some great data for analysis:
| class | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | stalk-root | stalk-surface-above-ring | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
| --------- | --------- | ----------- | --------- | ------- | ------- | --------------- | ------------ | --------- | ---------- | ----------- | ---------- | ------------------------ | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
| Poisonous | Convex | Smooth | Brown | Bruises | Pungent | Free | Close | Narrow | Black | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
| Edible | Convex | Smooth | Yellow | Bruises | Almond | Free | Close | Broad | Black | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Grasses |
| Edible | Bell | Smooth | White | Bruises | Anise | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Meadows |
| Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban |
You'll notice that all the data is textual. To use it in a chart, you'll need to convert it. Most of the data is represented as an object:
```python
print(mushrooms.select_dtypes(["object"]).columns)
```
The output is:
```output
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
'stalk-surface-below-ring', 'stalk-color-above-ring',
'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
'ring-type', 'spore-print-color', 'population', 'habitat'],
dtype='object')
```
Convert the 'class' column into a category:
```python
cols = mushrooms.select_dtypes(["object"]).columns
mushrooms[cols] = mushrooms[cols].astype('category')
```
```python
edibleclass=mushrooms.groupby(['class']).count()
edibleclass
```
Now, if you print the mushrooms data, you'll see it grouped into categories based on the poisonous/edible class:
| | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat |
| --------- | --------- | ----------- | --------- | ------- | ---- | --------------- | ------------ | --------- | ---------- | ----------- | --- | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- |
| class | | | | | | | | | | | | | | | | | | | | | |
| Edible | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | ... | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 |
| Poisonous | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | ... | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 |
Using the order in this table to create your class category labels, you can build a pie chart:
## Pie!
```python
labels=['Edible','Poisonous']
plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%')
plt.title('Edible?')
plt.show()
```
And voilà, a pie chart showing the proportions of the two mushroom classes. It's crucial to get the label order correct, so double-check the array when building the labels!
![pie chart](../../../../3-Data-Visualization/11-visualization-proportions/images/pie1-wb.png)
## Donuts!
A donut chart is a visually appealing variation of a pie chart, with a hole in the center. Let's use this method to explore the habitats where mushrooms grow:
```python
habitat=mushrooms.groupby(['habitat']).count()
habitat
```
Group the data by habitat. There are seven listed habitats, so use them as labels for your donut chart:
```python
labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood']
plt.pie(habitat['class'], labels=labels,
autopct='%1.1f%%', pctdistance=0.85)
center_circle = plt.Circle((0, 0), 0.40, fc='white')
fig = plt.gcf()
fig.gca().add_artist(center_circle)
plt.title('Mushroom Habitats')
plt.show()
```
![donut chart](../../../../3-Data-Visualization/11-visualization-proportions/images/donut-wb.png)
This code draws the chart and a center circle, then adds the circle to the chart. You can adjust the width of the center circle by changing `0.40` to another value.
Donut charts can be customized in various ways, especially the labels for better readability. Learn more in the [docs](https://matplotlib.org/stable/gallery/pie_and_polar_charts/pie_and_donut_labels.html?highlight=donut).
Now that you know how to group data and display it as a pie or donut chart, let's explore another type of chart: the waffle chart.
## Waffles!
A waffle chart visualizes quantities as a 2D array of squares. Let's use it to examine the proportions of mushroom cap colors in the dataset. First, install the helper library [PyWaffle](https://pypi.org/project/pywaffle/) and use Matplotlib:
```python
pip install pywaffle
```
Select a segment of your data to group:
```python
capcolor=mushrooms.groupby(['cap-color']).count()
capcolor
```
Create a waffle chart by defining labels and grouping your data:
```python
import pandas as pd
import matplotlib.pyplot as plt
from pywaffle import Waffle
data ={'color': ['brown', 'buff', 'cinnamon', 'green', 'pink', 'purple', 'red', 'white', 'yellow'],
'amount': capcolor['class']
}
df = pd.DataFrame(data)
fig = plt.figure(
FigureClass = Waffle,
rows = 100,
values = df.amount,
labels = list(df.color),
figsize = (30,30),
colors=["brown", "tan", "maroon", "green", "pink", "purple", "red", "whitesmoke", "yellow"],
)
```
The waffle chart clearly shows the proportions of cap colors in the mushroom dataset. Interestingly, there are many green-capped mushrooms!
![waffle chart](../../../../3-Data-Visualization/11-visualization-proportions/images/waffle.png)
✅ PyWaffle supports icons within the charts, using any icon available in [Font Awesome](https://fontawesome.com/). Experiment with icons to create even more engaging waffle charts.
In this lesson, you learned three ways to visualize proportions. First, group your data into categories, then choose the best visualization method—pie, donut, or waffle. Each offers a quick and intuitive snapshot of the dataset.
## 🚀 Challenge
Try recreating these charts in [Charticulator](https://charticulator.com).
## [Post-lecture quiz](https://purple-hill-04aebfb03.1.azurestaticapps.net/quiz/21)
## Review & Self Study
Choosing between pie, donut, or waffle charts isn't always straightforward. Here are some articles to help you decide:
https://www.beautiful.ai/blog/battle-of-the-charts-pie-chart-vs-donut-chart
https://medium.com/@hypsypops/pie-chart-vs-donut-chart-showdown-in-the-ring-5d24fd86a9ce
https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c6.htm
https://medium.datadriveninvestor.com/data-visualization-done-the-right-way-with-tableau-waffle-chart-fdf2a19be402
Do some research to learn more about this decision-making process.
## Assignment
[Try it in Excel](assignment.md)
---
**Disclaimer**:
This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.