Merge pull request #34 from Rishit-dagli/Rishit-dagli-patch-1

Add Confusion Matrix explanations
5 years ago · d53e92bc17
parent 10cab84252 771ee71be1
commit d53e92bc17
3 changed files with 353 additions and 102 deletions
--- a/Regression/4-Logistic/README.md
+++ b/Regression/4-Logistic/README.md
@ -169,22 +169,44 @@ Predicted labels:  [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
 0 0 0 1 0 1 0 0 1 0 0 0 1 0]
 ```

-Let's unpack some of those [terms](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report):
+Let's unpack some of those [terms](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report) with a confusion matrix to help us measure the performance of our mdoel:

-🎓 Precision: The fraction of relevant instances among the retrieved instances (e.g. which labels were well-labeled)
+```python
+from sklearn.metrics import confusion_matrix
+confusion_matrix(y_test, predictions)
+```
+
+Take a look at our confusion matrix:
+
+```
+array([[162,   4],
+       [ 33,   0]])
+```

-🎓 Recall: The fraction of relevant instances that were retrieved, whether well-labeled or not
+Let's understand what these numbers mean with an example. Let's say out model can classify between two categories, category 0 and category 1. If your model predicts something as category 0 and it belongs to category 0 in reality we call it a true positive, shown by the top left number. If your model predicts something as category 1 and it belongs to category 0 in reality we call it a false positive, shown by the top right number. If your model predicts something as category 0 and it belongs to category 1 in reality we call it a false negative, shown by the bottom left number. If your model predicts something as category 0 and it belongs to category 0 in reality we call it a true negative, shown by the top left number.

-🎓 f1-score: A weighted average of the precision and recall, with best being 1 and worst being 0
+![COnfusion Matrix](images/confusion-matrix.png)
+
+As you might have guessed we like to have a larger number of true positives and true negatives and a lower number of false negatives and false positives, which implies that the model performs better.
+
+Let's now understand more about the terms we saw earlier with the help of confusion matrix:
+
+🎓 Precision: TP/(TP + FN) The fraction of relevant instances among the retrieved instances (e.g. which labels were well-labeled)
+
+🎓 Recall: TP/(TP + FP) The fraction of relevant instances that were retrieved, whether well-labeled or not
+
+🎓 f1-score: (2 * precison * recall)/(precision + recall) A weighted average of the precision and recall, with best being 1 and worst being 0

 🎓 Support: The number of occurrences of each label retrieved

-🎓 Accuracy: The percentage of labels predicted accurately for a sample.
+🎓 Accuracy: (TP + TN)/(TP + TN + FP + FN) The percentage of labels predicted accurately for a sample.

 🎓 Macro Avg: The calculation of the unweighted mean metrics for each label, not taking label imbalance into account.

 🎓 Weighted Avg: The calculation of the mean metrics for each label, taking label imbalance into account by weighting them by their support (the number of true instances for each label).

+> Can you think which metric you should use if you want your model to reduce the number of false negatives?
+
 ## Visualize the ROC Curve of this Model

 This is not a bad model; its accuracy is in the 80% range so ideally you could use it to predict the color of a pumpkin given a set of variables.
--- a/Regression/4-Logistic/images/confusion-matrix.png
+++ b/Regression/4-Logistic/images/confusion-matrix.png
--- a/Regression/4-Logistic/solution/notebook.ipynb
+++ b/Regression/4-Logistic/solution/notebook.ipynb