You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

11 KiB

Visualizing Proportions

 Sketchnote by (@sketchthedocs)
Visualizing Proportions - Sketchnote by @nitya

In this lesson, you'll work with a nature-focused dataset to visualize proportions, such as the distribution of different types of fungi in a dataset about mushrooms. We'll dive into these fascinating fungi using a dataset from Audubon that provides details about 23 species of gilled mushrooms in the Agaricus and Lepiota families. You'll experiment with fun visualizations like:

  • Pie charts 🥧
  • Donut charts 🍩
  • Waffle charts 🧇

💡 Microsoft Research has an interesting project called Charticulator, which offers a free drag-and-drop interface for creating data visualizations. One of their tutorials uses this mushroom dataset! You can explore the data and learn the library simultaneously: Charticulator tutorial.

Pre-lecture quiz

Get to know your mushrooms 🍄

Mushrooms are fascinating organisms. Let's import a dataset to study them:

import pandas as pd
import matplotlib.pyplot as plt
mushrooms = pd.read_csv('../../data/mushrooms.csv')
mushrooms.head()

A table is displayed with some great data for analysis:

class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color stalk-shape stalk-root stalk-surface-above-ring stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
Poisonous Convex Smooth Brown Bruises Pungent Free Close Narrow Black Enlarging Equal Smooth Smooth White White Partial White One Pendant Black Scattered Urban
Edible Convex Smooth Yellow Bruises Almond Free Close Broad Black Enlarging Club Smooth Smooth White White Partial White One Pendant Brown Numerous Grasses
Edible Bell Smooth White Bruises Anise Free Close Broad Brown Enlarging Club Smooth Smooth White White Partial White One Pendant Brown Numerous Meadows
Poisonous Convex Scaly White Bruises Pungent Free Close Narrow Brown Enlarging Equal Smooth Smooth White White Partial White One Pendant Black Scattered Urban

You'll notice that all the data is textual. To use it in a chart, you'll need to convert it. Most of the data is represented as an object:

print(mushrooms.select_dtypes(["object"]).columns)

The output is:

Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
       'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
       'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
       'stalk-surface-below-ring', 'stalk-color-above-ring',
       'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
       'ring-type', 'spore-print-color', 'population', 'habitat'],
      dtype='object')

Convert the 'class' column into a category:

cols = mushrooms.select_dtypes(["object"]).columns
mushrooms[cols] = mushrooms[cols].astype('category')
edibleclass=mushrooms.groupby(['class']).count()
edibleclass

Now, if you print the mushrooms data, you'll see it grouped into categories based on the poisonous/edible class:

cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color stalk-shape ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
class
Edible 4208 4208 4208 4208 4208 4208 4208 4208 4208 4208 ... 4208 4208 4208 4208 4208 4208 4208 4208 4208 4208
Poisonous 3916 3916 3916 3916 3916 3916 3916 3916 3916 3916 ... 3916 3916 3916 3916 3916 3916 3916 3916 3916 3916

Using the order in this table to create your class category labels, you can build a pie chart:

Pie!

labels=['Edible','Poisonous']
plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%')
plt.title('Edible?')
plt.show()

And voilà, a pie chart showing the proportions of the two mushroom classes. It's crucial to get the label order correct, so double-check the array when building the labels!

pie chart

Donuts!

A donut chart is a visually appealing variation of a pie chart, with a hole in the center. Let's use this method to explore the habitats where mushrooms grow:

habitat=mushrooms.groupby(['habitat']).count()
habitat

Group the data by habitat. There are seven listed habitats, so use them as labels for your donut chart:

labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood']

plt.pie(habitat['class'], labels=labels,
        autopct='%1.1f%%', pctdistance=0.85)
  
center_circle = plt.Circle((0, 0), 0.40, fc='white')
fig = plt.gcf()

fig.gca().add_artist(center_circle)
  
plt.title('Mushroom Habitats')
  
plt.show()

donut chart

This code draws the chart and a center circle, then adds the circle to the chart. You can adjust the width of the center circle by changing 0.40 to another value.

Donut charts can be customized in various ways, especially the labels for better readability. Learn more in the docs.

Now that you know how to group data and display it as a pie or donut chart, let's explore another type of chart: the waffle chart.

Waffles!

A waffle chart visualizes quantities as a 2D array of squares. Let's use it to examine the proportions of mushroom cap colors in the dataset. First, install the helper library PyWaffle and use Matplotlib:

pip install pywaffle

Select a segment of your data to group:

capcolor=mushrooms.groupby(['cap-color']).count()
capcolor

Create a waffle chart by defining labels and grouping your data:

import pandas as pd
import matplotlib.pyplot as plt
from pywaffle import Waffle
  
data ={'color': ['brown', 'buff', 'cinnamon', 'green', 'pink', 'purple', 'red', 'white', 'yellow'],
    'amount': capcolor['class']
     }
  
df = pd.DataFrame(data)
  
fig = plt.figure(
    FigureClass = Waffle,
    rows = 100,
    values = df.amount,
    labels = list(df.color),
    figsize = (30,30),
    colors=["brown", "tan", "maroon", "green", "pink", "purple", "red", "whitesmoke", "yellow"],
)

The waffle chart clearly shows the proportions of cap colors in the mushroom dataset. Interestingly, there are many green-capped mushrooms!

waffle chart

PyWaffle supports icons within the charts, using any icon available in Font Awesome. Experiment with icons to create even more engaging waffle charts.

In this lesson, you learned three ways to visualize proportions. First, group your data into categories, then choose the best visualization method—pie, donut, or waffle. Each offers a quick and intuitive snapshot of the dataset.

🚀 Challenge

Try recreating these charts in Charticulator.

Post-lecture quiz

Review & Self Study

Choosing between pie, donut, or waffle charts isn't always straightforward. Here are some articles to help you decide:

https://www.beautiful.ai/blog/battle-of-the-charts-pie-chart-vs-donut-chart
https://medium.com/@hypsypops/pie-chart-vs-donut-chart-showdown-in-the-ring-5d24fd86a9ce
https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c6.htm
https://medium.datadriveninvestor.com/data-visualization-done-the-right-way-with-tableau-waffle-chart-fdf2a19be402

Do some research to learn more about this decision-making process.

Assignment

Try it in Excel


Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.