A confusion matrix, also known as an error matrix, is a special table structure that permits visualisation of the performance of an algorithm, often a supervised learning one, in the field of machine learning and more precisely the problem of statistical classification!

In this article we’ll learn about confusion matrix from basic to advance.

1. Learning Objective of the Precision and Recall

Precision and recall are two important metrics used in machine learning, information retrieval, and other fields to evaluate the performance of a binary classification model.

The learning objective of precision and recall is to understand how well a model can correctly classify instances of a particular class (positive class) while avoiding false positives and false negatives. Specifically, precision measures the proportion of true positive predictions out of all positive predictions made by the model, while recall measures the proportion of true positive predictions out of all actual positive instances in the dataset.

2. What is Confusion Matrix?

The confusion matrix is a table used to evaluate the performance of a classification model by comparing the predicted labels with the true labels of a set of samples. It shows the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for each class.

Here’s an example of a confusion matrix:

Confusion Matrix

In the confusion matrix above, TP represents the number of samples that are correctly classified as class 0, TN represents the number of samples that are correctly classified as class 1, FP represents the number of samples that are incorrectly classified as class 0, and FN represents the number of samples that are incorrectly classified as class 1.

The confusion matrix can be used to calculate various evaluation metrics, such as accuracy, precision, recall, and F1 score, which can provide insight into the performance of a classification model.

3. Calculating the Confusion Matrix?

As you already know I strongly believe in learning by doing. So throughout this article, we’ll talk in practical terms – by using a iris dataset.

Data Set Link: https://github.com/Narenderbeniwal/Spark-By-Example

An example code snippet for calculating the confusion matrix using scikit-learn library with iris data set:

# Import modules
from sklearn.datasets import load_iris
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset
iris = load_iris()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train a K-Nearest Neighbors classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(“Confusion Matrix:”)
print(cm)

# Calculate accuracy, precision, recall, and F1 score
acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred, average=’weighted’)
rec = recall_score(y_test, y_pred, average=’weighted’)
f1 = f1_score(y_test, y_pred, average=’weighted’)

print(f”Accuracy: {acc:.3f}”)
print(f”Precision: {prec:.3f}”)
print(f”Recall: {rec:.3f}”)
print(f”F1 Score: {f1:.3f}”)

The output of the above code is as below:

# Output:
Confusion Matrix:
[[19 0 0]
[ 0 12 1]
[ 0 1 12]]
Accuracy: 0.956
Precision: 0.959
Recall: 0.956
F1 Score: 0.956

In the above code, we first load the iris dataset and split it into training and testing sets using train_test_split(). We then create a KNeighborsClassifier model with n_neighbors=3 and fit it on the training set using knn.fit(). Next, we predict on the testing set using knn.predict() and store the predicted labels in y_pred. Finally, we calculate the confusion matrix using confusion_matrix(y_test, y_pred) and print the result.

The resulting confusion matrix has three rows and three columns, where the rows represent the true labels and the columns represent the predicted labels. The value in row i and column j of the matrix represents the number of samples that belong to class i and were predicted as class j.

4. What is accuracy?

Accuracy is a common metric used in classification tasks to measure the overall correctness of the model’s predictions. It is defined as the number of correct predictions divided by the total number of predictions made:

# Calculating accuracy
accuracy = (true positives + true negatives) / (true positives + true negatives + false positives + false negatives)

In other words, accuracy measures the proportion of all predictions (both positive and negative) that were correctly classified by the model. It is a useful metric for evaluating the model’s overall performance, especially when the positive and negative classes are balanced.

For example, in a spam email classification task, both false positives (i.e., classifying a legitimate email as spam) and false negatives (i.e., failing to classify a spam email as spam) are important, but not necessarily equally so. In this case, accuracy would be a useful metric to evaluate the overall performance of the model in classifying both types of emails.

However, accuracy may not always be the most informative metric, especially when the positive and negative classes are imbalanced. In these cases, other metrics such as precision, recall, and F1 score may provide a more nuanced evaluation of the model’s performance.

5. What is Precision?

Precision is a metric used in classification tasks to measure the accuracy of positive predictions made by a model. It is defined as the number of true positives divided by the sum of true positives and false positives:

# Precision
precision = true positives / (true positives + false positives)

In other words, precision measures the proportion of positive predictions that are actually correct. It is a useful metric for evaluating the model’s ability to make accurate positive predictions, especially when the cost of false positives is high.

For example, in a medical diagnosis task, false positives (i.e., predicting that a patient has a disease when they do not) can lead to unnecessary treatments and expenses, while false negatives (i.e., predicting that a patient does not have a disease when they do) can have serious consequences. In this case, precision would be a more informative metric than accuracy, as it would reflect the proportion of true disease cases among all predicted disease cases, which is an important factor in deciding the appropriate treatment.

6. What is Recall?

Recall is a metric used in classification tasks to measure the ability of a model to correctly identify all positive instances in the data. It is defined as the number of true positives divided by the sum of true positives and false negatives:

# Recall
recall = true positives / (true positives + false negatives)

In other words, recall measures the proportion of actual positive instances in the data that were correctly identified by the model. It is a useful metric for evaluating the model’s ability to identify all positive instances, especially when the cost of false negatives is high.

For example, in a fraud detection task, false negatives (i.e., failing to identify a fraudulent transaction) can result in financial losses for the company, while false positives (i.e., flagging a non-fraudulent transaction as fraudulent) can lead to unnecessary investigations and customer dissatisfaction. In this case, recall would be a more informative metric than accuracy, as it would reflect the proportion of true fraudulent transactions that were correctly identified by the model.

7. What is F1 Score?

F1 score is a metric used in classification tasks that combines precision and recall into a single score. It is the harmonic mean of precision and recall, and is defined as:

# F1 score
F1_score = 2 * (precision * recall) / (precision + recall)

The F1 score provides a balanced measure of precision and recall, and is useful in situations where both false positives and false negatives are equally important. It is often used as a performance metric in binary classification tasks, where one class is considered the “positive” class and the other class is considered the “negative” class.

For example, in a medical diagnosis task, the F1 score would be a useful metric if both false positives and false negatives have serious consequences. A high F1 score would indicate that the model is making accurate positive predictions while minimizing false positives and false negatives.

8. False Positive Rate & True Negative Rate in Confusion Matrix

False Positive Rate (FPR) and True Negative Rate (TNR) are two additional metrics commonly used in binary classification tasks, especially when dealing with imbalanced datasets.

False Positive Rate (FPR) is the proportion of negative instances that were incorrectly classified as positive by the model. It is calculated as:

# FPR
FPR = false positives / (true negatives + false positives)

In other words, FPR measures the rate at which the model falsely predicts the positive class when the true class is negative.

True Negative Rate (TNR), also known as specificity, is the proportion of negative instances that were correctly classified as negative by the model. It is calculated as:

# TNR
TNR = true negatives / (true negatives + false positives)

In other words, TNR measures the rate at which the model correctly predicts the negative class when the true class is negative.

Both FPR and TNR are important metrics to consider when evaluating the performance of a binary classifier, especially when the negative class is the minority class in an imbalanced dataset. A good model should have a low FPR and a high TNR, indicating that it is able to correctly identify negative instances while minimizing the number of false positives.

9. Frequently asked Interview Questions on the confusion matrix?

What is a confusion matrix?

A confusion matrix is a table that summarizes the performance of a classification model by comparing the predicted and actual class labels for a set of test data. It shows the number of true positives, true negatives, false positives, and false negatives for each class.

How is a confusion matrix useful in evaluating a classification model?

A confusion matrix provides a comprehensive summary of the performance of a classification model, including measures such as accuracy, precision, recall, and F1 score. It also helps in identifying the types of errors made by the model, which can be used to improve the model’s performance.

What is accuracy, precision, recall, and F1 score in the context of a confusion matrix?

Accuracy is the proportion of correct predictions made by the model over the total number of predictions. Precision is the proportion of true positives over the total number of predicted positives. Recall is the proportion of true positives over the total number of actual positives. F1 score is the harmonic mean of precision and recall.

How do I interpret a confusion matrix?

The diagonal elements of the confusion matrix represent the number of correctly classified instances for each class, while the off-diagonal elements represent the misclassifications. The interpretation of a confusion matrix depends on the specific problem being solved and the goals of the analysis. By analyzing the confusion matrix, one can calculate various metrics such as accuracy, precision, recall, and F1 score to get a better understanding of how well the classifier is performing.

What are some common errors in a confusion matrix?

Some common errors in a confusion matrix include false positives and false negatives. False positives occur when the model predicts a positive label for an instance that actually belongs to the negative class. False negatives occur when the model predicts a negative label for an instance that actually belongs to the positive class.

How can I improve the performance of my classification model based on the confusion matrix?

Based on the errors identified in the confusion matrix, one can adjust the model parameters or try different algorithms to improve the model’s performance. For example, if the model is making too many false positives, one can try increasing the threshold for predicting a positive label or using a different algorithm that is less prone to false positives.

Conclusion

In conclusion, the confusion matrix is a useful tool for evaluating the performance of a classifier. It provides a clear picture of how well the model is able to predict the true classes of a dataset by comparing its predictions to the actual labels. The diagonal elements of the confusion matrix represent the number of correctly classified instances for each class, while the off-diagonal elements represent the misclassifications.

Confusion Matrix in Machine Learning Narender Kumar Spark By {Examples}