Merge pull request #307 from flegaspi700/main

The x and y labels on some of the PNG images are not readable when using a dark theme setting
pull/308/head
Jasmine Greenaway 4 years ago committed by GitHub
commit de894a6489
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -20,6 +20,15 @@ birds = pd.read_csv('../../data/birds.csv')
birds.head()
```
| | Name | ScientificName | Category | Order | Family | Genus | ConservationStatus | MinLength | MaxLength | MinBodyMass | MaxBodyMass | MinWingspan | MaxWingspan |
| ---: | :--------------------------- | :--------------------- | :-------------------- | :----------- | :------- | :---------- | :----------------- | --------: | --------: | ----------: | ----------: | ----------: | ----------: |
| 0 | Black-bellied whistling-duck | Dendrocygna autumnalis | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 47 | 56 | 652 | 1020 | 76 | 94 |
| 1 | Fulvous whistling-duck | Dendrocygna bicolor | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Dendrocygna | LC | 45 | 53 | 712 | 1050 | 85 | 93 |
| 2 | Snow goose | Anser caerulescens | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 79 | 2050 | 4050 | 135 | 165 |
| 3 | Ross's goose | Anser rossii | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 57.3 | 64 | 1066 | 1567 | 113 | 116 |
| 4 | Greater white-fronted goose | Anser albifrons | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 81 | 1930 | 3310 | 130 | 165 |
In general, you can quickly look at the way data is distributed by using a scatter plot as we did in the previous lesson:
```python
@ -31,6 +40,8 @@ plt.xlabel('Max Length')
plt.show()
```
![max length per order](images/scatter-wb.png)
This gives an overview of the general distribution of body length per bird Order, but it is not the optimal way to display true distributions. That task is usually handled by creating a Histogram.
## Working with histograms
@ -40,7 +51,7 @@ Matplotlib offers very good ways to visualize data distribution using Histograms
birds['MaxBodyMass'].plot(kind = 'hist', bins = 10, figsize = (12,12))
plt.show()
```
![distribution over the entire dataset](images/dist1.png)
![distribution over the entire dataset](images/dist1-wb.png)
As you can see, most of the 400+ birds in this dataset fall in the range of under 2000 for their Max Body Mass. Gain more insight into the data by changing the `bins` parameter to a higher number, something like 30:
@ -48,7 +59,7 @@ As you can see, most of the 400+ birds in this dataset fall in the range of unde
birds['MaxBodyMass'].plot(kind = 'hist', bins = 30, figsize = (12,12))
plt.show()
```
![distribution over the entire dataset with larger bins param](images/dist2.png)
![distribution over the entire dataset with larger bins param](images/dist2-wb.png)
This chart shows the distribution in a bit more granular fashion. A chart less skewed to the left could be created by ensuring that you only select data within a given range:
@ -59,7 +70,7 @@ filteredBirds = birds[(birds['MaxBodyMass'] > 1) & (birds['MaxBodyMass'] < 60)]
filteredBirds['MaxBodyMass'].plot(kind = 'hist',bins = 40,figsize = (12,12))
plt.show()
```
![filtered histogram](images/dist3.png)
![filtered histogram](images/dist3-wb.png)
✅ Try some other filters and data points. To see the full distribution of the data, remove the `['MaxBodyMass']` filter to show labeled distributions.
@ -76,7 +87,7 @@ hist = ax.hist2d(x, y)
```
There appears to be an expected correlation between these two elements along an expected axis, with one particularly strong point of convergence:
![2D plot](images/2D.png)
![2D plot](images/2D-wb.png)
Histograms work well by default for numeric data. What if you need to see distributions according to text data?
## Explore the dataset for distributions using text data
@ -115,7 +126,7 @@ plt.gca().set(title='Conservation Status', ylabel='Max Body Mass')
plt.legend();
```
![wingspan and conservation collation](images/histogram-conservation.png)
![wingspan and conservation collation](images/histogram-conservation-wb.png)
There doesn't seem to be a good correlation between minimum wingspan and conservation status. Test other elements of the dataset using this method. You can try different filters as well. Do you find any correlation?

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

@ -57,6 +57,12 @@ Take this data and convert the 'class' column to a category:
cols = mushrooms.select_dtypes(["object"]).columns
mushrooms[cols] = mushrooms[cols].astype('category')
```
```python
edibleclass=mushrooms.groupby(['class']).count()
edibleclass
```
Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class:
@ -78,7 +84,7 @@ plt.show()
```
Voila, a pie chart showing the proportions of this data according to these two classes of mushrooms. It's quite important to get the order of the labels correct, especially here, so be sure to verify the order with which the label array is built!
![pie chart](images/pie1.png)
![pie chart](images/pie1-wb.png)
## Donuts!
@ -108,7 +114,7 @@ plt.title('Mushroom Habitats')
plt.show()
```
![donut chart](images/donut.png)
![donut chart](images/donut-wb.png)
This code draws a chart and a center circle, then adds that center circle in the chart. Edit the width of the center circle by changing `0.40` to another value.

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.8 KiB

Loading…
Cancel
Save