diff --git a/3-Data-Visualization/10-visualization-distributions/README.md b/3-Data-Visualization/10-visualization-distributions/README.md index 9f1366c7..22d045c8 100644 --- a/3-Data-Visualization/10-visualization-distributions/README.md +++ b/3-Data-Visualization/10-visualization-distributions/README.md @@ -20,6 +20,15 @@ birds = pd.read_csv('../../data/birds.csv') birds.head() ``` +| | Name | ScientificName | Category | Order | Family | Genus | ConservationStatus | MinLength | MaxLength | MinBodyMass | MaxBodyMass | MinWingspan | MaxWingspan | +| ---: | :--------------------------- | :--------------------- | :-------------------- | :----------- | :------- | :---------- | :----------------- | --------: | --------: | ----------: | ----------: | ----------: | ----------: | +| 0 | Black-bellied whistling-duck | Dendrocygna autumnalis | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 47 | 56 | 652 | 1020 | 76 | 94 | +| 1 | Fulvous whistling-duck | Dendrocygna bicolor | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 45 | 53 | 712 | 1050 | 85 | 93 | +| 2 | Snow goose | Anser caerulescens | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 79 | 2050 | 4050 | 135 | 165 | +| 3 | Ross's goose | Anser rossii | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 57.3 | 64 | 1066 | 1567 | 113 | 116 | +| 4 | Greater white-fronted goose | Anser albifrons | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 81 | 1930 | 3310 | 130 | 165 | + + In general, you can quickly look at the way data is distributed by using a scatter plot as we did in the previous lesson: ```python @@ -31,6 +40,8 @@ plt.xlabel('Max Length') plt.show() ``` +![max length per order](images/scatter-wb.png) + This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram. ## Working with histograms @@ -40,7 +51,7 @@ Matplotlib offers very good ways to visualize data distribution using Histograms birds['MaxBodyMass'].plot(kind = 'hist', bins = 10, figsize = (12,12)) plt.show() ``` -![distribution over the entire dataset](images/dist1.png) +![distribution over the entire dataset](images/dist1-wb.png) As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight into the data by changing the `bins` parameter to a higher number, something like 30: @@ -48,7 +59,7 @@ As you can see, most of the 400+ birds in this dataset fall in the range of unde birds['MaxBodyMass'].plot(kind = 'hist', bins = 30, figsize = (12,12)) plt.show() ``` -![distribution over the entire dataset with larger bins param](images/dist2.png) +![distribution over the entire dataset with larger bins param](images/dist2-wb.png) This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range: @@ -59,7 +70,7 @@ filteredBirds = birds[(birds['MaxBodyMass'] > 1) & (birds['MaxBodyMass'] < 60)] filteredBirds['MaxBodyMass'].plot(kind = 'hist',bins = 40,figsize = (12,12)) plt.show() ``` -![filtered histogram](images/dist3.png) +![filtered histogram](images/dist3-wb.png) ✅ Try some other filters and data points. To see the full distribution of the data, remove the `['MaxBodyMass']` filter to show labeled distributions. @@ -76,7 +87,7 @@ hist = ax.hist2d(x, y) ``` There appears to be an expected correlation between these two elements along an expected axis, with one particularly strong point of convergence: -![2D plot](images/2D.png) +![2D plot](images/2D-wb.png) Histograms work well by default for numeric data. What if you need to see distributions according to text data? ## Explore the dataset for distributions using text data @@ -115,7 +126,7 @@ plt.gca().set(title='Conservation Status', ylabel='Max Body Mass') plt.legend(); ``` -![wingspan and conservation collation](images/histogram-conservation.png) +![wingspan and conservation collation](images/histogram-conservation-wb.png) There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation? diff --git a/3-Data-Visualization/10-visualization-distributions/images/2D-wb.png b/3-Data-Visualization/10-visualization-distributions/images/2D-wb.png new file mode 100644 index 00000000..3b74ec1b Binary files /dev/null and b/3-Data-Visualization/10-visualization-distributions/images/2D-wb.png differ diff --git a/3-Data-Visualization/10-visualization-distributions/images/dist1-wb.png b/3-Data-Visualization/10-visualization-distributions/images/dist1-wb.png new file mode 100644 index 00000000..28c834f5 Binary files /dev/null and b/3-Data-Visualization/10-visualization-distributions/images/dist1-wb.png differ diff --git a/3-Data-Visualization/10-visualization-distributions/images/dist2-wb.png b/3-Data-Visualization/10-visualization-distributions/images/dist2-wb.png new file mode 100644 index 00000000..467d6c34 Binary files /dev/null and b/3-Data-Visualization/10-visualization-distributions/images/dist2-wb.png differ diff --git a/3-Data-Visualization/10-visualization-distributions/images/dist3-wb.png b/3-Data-Visualization/10-visualization-distributions/images/dist3-wb.png new file mode 100644 index 00000000..0ec8b5ba Binary files /dev/null and b/3-Data-Visualization/10-visualization-distributions/images/dist3-wb.png differ diff --git a/3-Data-Visualization/10-visualization-distributions/images/histogram-conservation-wb.png b/3-Data-Visualization/10-visualization-distributions/images/histogram-conservation-wb.png new file mode 100644 index 00000000..394d4f38 Binary files /dev/null and b/3-Data-Visualization/10-visualization-distributions/images/histogram-conservation-wb.png differ diff --git a/3-Data-Visualization/10-visualization-distributions/images/scatter-wb.png b/3-Data-Visualization/10-visualization-distributions/images/scatter-wb.png new file mode 100644 index 00000000..4da67df9 Binary files /dev/null and b/3-Data-Visualization/10-visualization-distributions/images/scatter-wb.png differ diff --git a/3-Data-Visualization/11-visualization-proportions/README.md b/3-Data-Visualization/11-visualization-proportions/README.md index bce02c2c..2f265e72 100644 --- a/3-Data-Visualization/11-visualization-proportions/README.md +++ b/3-Data-Visualization/11-visualization-proportions/README.md @@ -57,6 +57,12 @@ Take this data and convert the 'class' column to a category: cols = mushrooms.select_dtypes(["object"]).columns mushrooms[cols] = mushrooms[cols].astype('category') ``` + +```python +edibleclass=mushrooms.groupby(['class']).count() +edibleclass +``` + Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class: @@ -78,7 +84,7 @@ plt.show() ``` Voila, a pie chart showing the proportions of this data according to these two classes of mushrooms. It's quite important to get the order of the labels correct, especially here, so be sure to verify the order with which the label array is built! -![pie chart](images/pie1.png) +![pie chart](images/pie1-wb.png) ## Donuts! @@ -108,7 +114,7 @@ plt.title('Mushroom Habitats') plt.show() ``` -![donut chart](images/donut.png) +![donut chart](images/donut-wb.png) This code draws a chart and a center circle, then adds that center circle in the chart. Edit the width of the center circle by changing `0.40` to another value. diff --git a/3-Data-Visualization/11-visualization-proportions/images/donut-wb.png b/3-Data-Visualization/11-visualization-proportions/images/donut-wb.png new file mode 100644 index 00000000..f876e95c Binary files /dev/null and b/3-Data-Visualization/11-visualization-proportions/images/donut-wb.png differ diff --git a/3-Data-Visualization/11-visualization-proportions/images/pie1-wb.png b/3-Data-Visualization/11-visualization-proportions/images/pie1-wb.png new file mode 100644 index 00000000..faf89f3a Binary files /dev/null and b/3-Data-Visualization/11-visualization-proportions/images/pie1-wb.png differ