# Confusion Matrix

# Confusion Matrix

A confusion matrix is a table that is used to define the performance of a classification model on a set of test data for which the true values are known. A confusion matrix is a table with four different combinations of predicted and actual values, typically referred to as True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN).

True Positives (TP) are the cases in which the model correctly predicted the positive class.

False Positives (FP) are the cases in which the model predicted the positive class, but it was actually the negative class.

True Negatives (TN) are the cases in which the model correctly predicted the negative class.

False Negatives (FN) are the cases in which the model predicted the negative class, but it was actually the positive class.

We can see these values more clearly in the confusion matrix below:

The confusion matrix can be used to calculate various performance metrics, such as accuracy, precision, recall, specificity, and F1-score.

Accuracy is the proportion of correct predictions out of all predictions made.

Precision is the proportion of true positive predictions out of all positive predictions made.

Recall (sensitivity) is the proportion of true positive predictions out of all actual positive cases.

Specificity is the proportion of true negative predictions out of all actual negative cases.

F1-score is the harmonic mean of precision and recall.

Assume that we have a binary classification model that predicts whether a customer will buy a product or not. The actual class labels for a sample of customers are [1, 0, 1, 0, 1], where 1 indicates that the customer bought the product and 0 indicates that the customer did not buy the product. The model predictions for the same sample of customers are [1, 1, 0, 0, 0]. The confusion matrix for this example would look like this:

Actual: 1 | Actual: 0 | |
---|---|---|

Predict: 1 | TP=2 | FP=1 |

Predict: 0 | FN=1 | TN=2 |

The accuracy of the model can be calculated as (TP + TN) / (TP + TN + FP + FN), which in this case would be (2 + 2) / (2 + 1 + 1 + 2) = 0.6 or 60%. The precision of the model can be calculated as TP / (TP + FP), which in this case would be 2 / (2 + 1) = 0.67 or 67%. The recall of the model can be calculated as TP / (TP + FN), which in this case would be 2 / (2 + 1) = 0.67 or 67%. The F1 score is the harmonic mean of precision and recall, and gives a balance between the two metrics. In this case, the F1 score would be 2 x (precision x recall) / (precision + recall) = 2 x (0.67 x 0.67) / (0.67 + 0.67) = 0.67 or 67%.

A confusion matrix is a powerful tool for evaluating the performance of classification models and can help to identify the areas where the model is performing well and the areas where it needs improvement.

Updated on: 30/01/2023

Thank you!