Merge pull request #34 from Rishit-dagli/Rishit-dagli-patch-1

Add Confusion Matrix explanations
pull/34/head
Jen Looper 4 years ago committed by GitHub
commit d53e92bc17
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -169,22 +169,44 @@ Predicted labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 0 1 0 0 0 1 0]
```
Let's unpack some of those [terms](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report):
Let's unpack some of those [terms](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report) with a confusion matrix to help us measure the performance of our mdoel:
🎓 Precision: The fraction of relevant instances among the retrieved instances (e.g. which labels were well-labeled)
```python
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, predictions)
```
Take a look at our confusion matrix:
```
array([[162, 4],
[ 33, 0]])
```
🎓 Recall: The fraction of relevant instances that were retrieved, whether well-labeled or not
Let's understand what these numbers mean with an example. Let's say out model can classify between two categories, category 0 and category 1. If your model predicts something as category 0 and it belongs to category 0 in reality we call it a true positive, shown by the top left number. If your model predicts something as category 1 and it belongs to category 0 in reality we call it a false positive, shown by the top right number. If your model predicts something as category 0 and it belongs to category 1 in reality we call it a false negative, shown by the bottom left number. If your model predicts something as category 0 and it belongs to category 0 in reality we call it a true negative, shown by the top left number.
🎓 f1-score: A weighted average of the precision and recall, with best being 1 and worst being 0
![COnfusion Matrix](images/confusion-matrix.png)
As you might have guessed we like to have a larger number of true positives and true negatives and a lower number of false negatives and false positives, which implies that the model performs better.
Let's now understand more about the terms we saw earlier with the help of confusion matrix:
🎓 Precision: TP/(TP + FN) The fraction of relevant instances among the retrieved instances (e.g. which labels were well-labeled)
🎓 Recall: TP/(TP + FP) The fraction of relevant instances that were retrieved, whether well-labeled or not
🎓 f1-score: (2 * precison * recall)/(precision + recall) A weighted average of the precision and recall, with best being 1 and worst being 0
🎓 Support: The number of occurrences of each label retrieved
🎓 Accuracy: The percentage of labels predicted accurately for a sample.
🎓 Accuracy: (TP + TN)/(TP + TN + FP + FN) The percentage of labels predicted accurately for a sample.
🎓 Macro Avg: The calculation of the unweighted mean metrics for each label, not taking label imbalance into account.
🎓 Weighted Avg: The calculation of the mean metrics for each label, taking label imbalance into account by weighting them by their support (the number of true instances for each label).
> Can you think which metric you should use if you want your model to reduce the number of false negatives?
## Visualize the ROC Curve of this Model
This is not a bad model; its accuracy is in the 80% range so ideally you could use it to predict the color of a pumpkin given a set of variables.

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.8 KiB

File diff suppressed because one or more lines are too long
Loading…
Cancel
Save