# Visualizing Proportions |![ Sketchnote by [(@sketchthedocs)](https://sketchthedocs.dev) ](../../sketchnotes/11-Visualizing-Proportions.png)| |:---:| |Visualizing Proportions - _Sketchnote by [@nitya](https://twitter.com/nitya)_ | In this lesson, you'll work with a nature-focused dataset to visualize proportions, such as the distribution of different types of fungi in a dataset about mushrooms. We'll dive into these fascinating fungi using a dataset from Audubon that provides details about 23 species of gilled mushrooms in the Agaricus and Lepiota families. You'll experiment with fun visualizations like: - Pie charts 🥧 - Donut charts 🍩 - Waffle charts 🧇 > 💡 Microsoft Research has an interesting project called [Charticulator](https://charticulator.com), which offers a free drag-and-drop interface for creating data visualizations. One of their tutorials uses this mushroom dataset! You can explore the data and learn the library simultaneously: [Charticulator tutorial](https://charticulator.com/tutorials/tutorial4.html). ## [Pre-lecture quiz](https://purple-hill-04aebfb03.1.azurestaticapps.net/quiz/20) ## Get to know your mushrooms 🍄 Mushrooms are fascinating organisms. Let's import a dataset to study them: ```python import pandas as pd import matplotlib.pyplot as plt mushrooms = pd.read_csv('../../data/mushrooms.csv') mushrooms.head() ``` A table is displayed with some great data for analysis: | class | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | stalk-root | stalk-surface-above-ring | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat | | --------- | --------- | ----------- | --------- | ------- | ------- | --------------- | ------------ | --------- | ---------- | ----------- | ---------- | ------------------------ | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- | | Poisonous | Convex | Smooth | Brown | Bruises | Pungent | Free | Close | Narrow | Black | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban | | Edible | Convex | Smooth | Yellow | Bruises | Almond | Free | Close | Broad | Black | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Grasses | | Edible | Bell | Smooth | White | Bruises | Anise | Free | Close | Broad | Brown | Enlarging | Club | Smooth | Smooth | White | White | Partial | White | One | Pendant | Brown | Numerous | Meadows | | Poisonous | Convex | Scaly | White | Bruises | Pungent | Free | Close | Narrow | Brown | Enlarging | Equal | Smooth | Smooth | White | White | Partial | White | One | Pendant | Black | Scattered | Urban | You'll notice that all the data is textual. To use it in a chart, you'll need to convert it. Most of the data is represented as an object: ```python print(mushrooms.select_dtypes(["object"]).columns) ``` The output is: ```output Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor', 'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color', 'stalk-shape', 'stalk-root', 'stalk-surface-above-ring', 'stalk-surface-below-ring', 'stalk-color-above-ring', 'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number', 'ring-type', 'spore-print-color', 'population', 'habitat'], dtype='object') ``` Convert the 'class' column into a category: ```python cols = mushrooms.select_dtypes(["object"]).columns mushrooms[cols] = mushrooms[cols].astype('category') ``` ```python edibleclass=mushrooms.groupby(['class']).count() edibleclass ``` Now, if you print the mushrooms data, you'll see it grouped into categories based on the poisonous/edible class: | | cap-shape | cap-surface | cap-color | bruises | odor | gill-attachment | gill-spacing | gill-size | gill-color | stalk-shape | ... | stalk-surface-below-ring | stalk-color-above-ring | stalk-color-below-ring | veil-type | veil-color | ring-number | ring-type | spore-print-color | population | habitat | | --------- | --------- | ----------- | --------- | ------- | ---- | --------------- | ------------ | --------- | ---------- | ----------- | --- | ------------------------ | ---------------------- | ---------------------- | --------- | ---------- | ----------- | --------- | ----------------- | ---------- | ------- | | class | | | | | | | | | | | | | | | | | | | | | | | Edible | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | ... | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | 4208 | | Poisonous | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | ... | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | 3916 | Using the order in this table to create your class category labels, you can build a pie chart: ## Pie! ```python labels=['Edible','Poisonous'] plt.pie(edibleclass['population'],labels=labels,autopct='%.1f %%') plt.title('Edible?') plt.show() ``` And voilà, a pie chart showing the proportions of the two mushroom classes. It's crucial to get the label order correct, so double-check the array when building the labels! ![pie chart](../../../../3-Data-Visualization/11-visualization-proportions/images/pie1-wb.png) ## Donuts! A donut chart is a visually appealing variation of a pie chart, with a hole in the center. Let's use this method to explore the habitats where mushrooms grow: ```python habitat=mushrooms.groupby(['habitat']).count() habitat ``` Group the data by habitat. There are seven listed habitats, so use them as labels for your donut chart: ```python labels=['Grasses','Leaves','Meadows','Paths','Urban','Waste','Wood'] plt.pie(habitat['class'], labels=labels, autopct='%1.1f%%', pctdistance=0.85) center_circle = plt.Circle((0, 0), 0.40, fc='white') fig = plt.gcf() fig.gca().add_artist(center_circle) plt.title('Mushroom Habitats') plt.show() ``` ![donut chart](../../../../3-Data-Visualization/11-visualization-proportions/images/donut-wb.png) This code draws the chart and a center circle, then adds the circle to the chart. You can adjust the width of the center circle by changing `0.40` to another value. Donut charts can be customized in various ways, especially the labels for better readability. Learn more in the [docs](https://matplotlib.org/stable/gallery/pie_and_polar_charts/pie_and_donut_labels.html?highlight=donut). Now that you know how to group data and display it as a pie or donut chart, let's explore another type of chart: the waffle chart. ## Waffles! A waffle chart visualizes quantities as a 2D array of squares. Let's use it to examine the proportions of mushroom cap colors in the dataset. First, install the helper library [PyWaffle](https://pypi.org/project/pywaffle/) and use Matplotlib: ```python pip install pywaffle ``` Select a segment of your data to group: ```python capcolor=mushrooms.groupby(['cap-color']).count() capcolor ``` Create a waffle chart by defining labels and grouping your data: ```python import pandas as pd import matplotlib.pyplot as plt from pywaffle import Waffle data ={'color': ['brown', 'buff', 'cinnamon', 'green', 'pink', 'purple', 'red', 'white', 'yellow'], 'amount': capcolor['class'] } df = pd.DataFrame(data) fig = plt.figure( FigureClass = Waffle, rows = 100, values = df.amount, labels = list(df.color), figsize = (30,30), colors=["brown", "tan", "maroon", "green", "pink", "purple", "red", "whitesmoke", "yellow"], ) ``` The waffle chart clearly shows the proportions of cap colors in the mushroom dataset. Interestingly, there are many green-capped mushrooms! ![waffle chart](../../../../3-Data-Visualization/11-visualization-proportions/images/waffle.png) ✅ PyWaffle supports icons within the charts, using any icon available in [Font Awesome](https://fontawesome.com/). Experiment with icons to create even more engaging waffle charts. In this lesson, you learned three ways to visualize proportions. First, group your data into categories, then choose the best visualization method—pie, donut, or waffle. Each offers a quick and intuitive snapshot of the dataset. ## 🚀 Challenge Try recreating these charts in [Charticulator](https://charticulator.com). ## [Post-lecture quiz](https://purple-hill-04aebfb03.1.azurestaticapps.net/quiz/21) ## Review & Self Study Choosing between pie, donut, or waffle charts isn't always straightforward. Here are some articles to help you decide: https://www.beautiful.ai/blog/battle-of-the-charts-pie-chart-vs-donut-chart https://medium.com/@hypsypops/pie-chart-vs-donut-chart-showdown-in-the-ring-5d24fd86a9ce https://www.mit.edu/~mbarker/formula1/f1help/11-ch-c6.htm https://medium.datadriveninvestor.com/data-visualization-done-the-right-way-with-tableau-waffle-chart-fdf2a19be402 Do some research to learn more about this decision-making process. ## Assignment [Try it in Excel](assignment.md) --- **Disclaimer**: This document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.