Confusion Matrix: Machine Learning Interview Prep 13

5 min readDec 5, 2023

Confusion matrix, accuracy metrics in supervised learning classification such as precision, recall, sensitivity, F1 score, ROC-AUC, type-I, type-II error.

A confusion matrix is like a scoreboard that helps us understand how well a machine learning model is performing. It shows us the counts of correct and incorrect predictions made by the model. The rows represent the actual classes or labels, while the columns represent the predicted classes. By looking at the numbers in the matrix, we can see where the model is getting things right and where it’s making mistakes, helping us to improve its performance. In simpler terms, it’s like a report card that tells us how good our model is at recognizing different things.

Photo: Worth Avenue Clock Tower, Palm Beach, Florida, USA Credit: Tasnim and Kawsar

Let’s check your basic knowledge of supervised learning in the classification and confusion matrix. Here are 10 multiple-choice questions for you and there’s no time limit. Have fun!

Question 1: Accuracy is simply a ratio of correctly predicted observations to the total observations. From the above confusion matrix, how would you define Accuracy?
(A) Accuracy = (FP+FN)/(TP+FN+FP+TN)
(B) Accuracy = (TP+TN)/(TP+FN+FP+TN)
(C) Accuracy = (TP+FN)/(TP+FN+FP+TN)
(D) Accuracy = (FP+TN)/(TP+FN+FP+TN)

Question 2: From the above confusion matrix, how would you define Error?
(A) Error = (FP+FN)/(TP+FN+FP+TN)
(B) Error = (TP+TN)/(TP+FN+FP+TN)
(C) Error = (TP+FN)/(TP+FN+FP+TN)
(D) Error = (FP+TN)/(TP+FN+FP+TN)

Question 3: What’s the correct definition for the True Positive Rate (TPR)? Here, P is the number of actual positives, N is the number of actual negatives, and total population = P+N.
(A) TPR = TP/N = TP/(TP+FN)
(B) TPR = TP/N = TP/(TP+FN)
(C) TPR = TP/P = TP/(TP+FN)
(D) TPR = TP/P = TP/(TP+FP)

Question 4: What’s the correct definition for the fall-out or False Positive Rate (FPR)? Here, P is the number of actual positives, N is the number of actual negatives, and total population = P+N.
(A) FPR = FP/P = FP/(FP+TN)
(B) FPR = FP/P = FP/(FP+TN)
(C) FPR = FP/N = FP/(FP+TP)
(D) FPR = FP/N = FP/(FP+TN)

Question 5: Recall is the ratio of correctly predicted positive observations to all observations in actual class. Recall = TP/P = TP/(TP+FN). Here, P is the number of actual positives. Recall is also known as -
(A) Sensitivity, Hit Rate, True Positive Rate (TPR)
(B) Specificity, Hit Rate, True Positive Rate (TPR)
(C) Sensitivity, Miss Rate, True Positive Rate (TPR)
(D) Sensitivity, Hit Rate, Fall-out

Question 6: What’s the correct definition for Precision? (Select two)
(A) Precision =TP/(TP+FP)
(B) Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
(C) Precision =FP/(TP+FP)
(D) Precision is the ratio of incorrectly predicted positive observations to the total predicted positive observations.

Question 7: Which statements about the Type-I and Type-II errors are correct? (Select two)
(A) Type-I error = FPR
(B) Type-II error = FNR
(C) Type-I error = FNR
(D) Type-II error = FPR

Question 8: The F1 score is the harmonic mean of Precision and Recall. What’s the correct formula for the F1 score?
(A) F1 score = (2*Precision*Recall)/(Precision + Recall)
(B) F1 score = (2*Precision*TPR)/(Precision + TPR)
(C) F1 score = (2*Precision*Sensitivity)/(Precision + Sensitivity)
(D) All of the above

Question 9: Which statement about the Receiver Operating Characteristic (ROC)-(Area Under the Curve) AUC Curve is correct?
(A) ROC is a probability curve that plots the true positive rate (sensitivity or recall) against the false positive rate (1 — specificity) at various thresholds.
(B) AUC is the area under the ROC curve. If the AUC is high (close to 1), the model is better at distinguishing between positive and negative classes.
(C) If AUC = 0.5, it represents a model that is no better than random.
(D) All of the above.

Question 10: Which statement about the ROC-AUC Curve is correct?
(A) Thresholding is used to create binary classification outcomes. For example, threshold = 0.5, if probability ≥ 0.5, then predicted class = 1, and if probability < 0.5, then predicted class = 0. By changing this threshold, you can increase or decrease the recall or precision of a regressor, and this trade-off is visualized with a ROC curve.
(B) Thresholding is used to create binary classification outcomes. For example, threshold = 0.5, if probability < 0.5, then predicted class = 1, and if probability > 0.5, then predicted class = 0. By changing this threshold, you can increase or decrease the recall or precision of a classifier, and this trade-off is visualized with a ROC curve.
(C) Thresholding is used to create binary classification outcomes. For example, threshold = 0.5, if probability ≥ 0.5, then predicted class = 1, and if probability < 0.5, then predicted class = 0. By changing this threshold, you can increase or decrease the recall or precision of a classifier, and this trade-off is visualized with a ROC curve.
(D) Thresholding is used to create binary classification outcomes. For example, threshold = 0.5, if probability ≥ 0.5, then predicted class = 1, and if probability < 0.5, then predicted class = 0. Without changing this threshold, you can increase or decrease the recall or precision of a classifier, and this trade-off is visualized with a ROC curve.

The solutions will be published in the next Principal Component Analysis (PCA) Part 1: Machine Learning Interview Prep 14.

Principal Component Analysis (PCA) Part 1: Machine Learning Interview Prep 14

Principal Component Analysis (PCA) squeezes lots of information into fewer pieces, sort of like squishing a big…

kawsar34.medium.com

Happy learning. If you like the questions and enjoy taking the test, please subscribe to my email list for the latest ML questions, follow my Medium profile, and leave a clap for me. Feel free to discuss your thoughts on these questions in the comment section. Don’t forget to share the quiz link with your friends or LinkedIn connections. If you want to connect with me on LinkedIn: my LinkedIn profile.

The solution of K-Means Clustering: Machine Learning Interview Prep 12 — 1(A, B), 2(D), 3(A, B), 4(D), 5(D), 6(A, C), 7(A, C), 8(A), 9(D), 10(D)

K-Means Clustering: Machine Learning Interview Prep 12

Let’s check your basic knowledge of K-Means Clustering. Here are 10 multiple-choice questions for you and there’s no…

kawsar34.medium.com

References:
[1] Machine Learning Fundamentals: The Confusion Matrix
[2] Machine Learning Fundamentals: Sensitivity and Specificity
[3] ROC and AUC, Clearly Explained!
[4] All about Confusion Matrix — Preparing for Interview Questions

Principal Component Analysis (PCA) Part 2: Machine Learning Interview Prep 15

Principal Component Analysis (PCA) is a method used to simplify complex data by reducing its dimensions. It works by…

kawsar34.medium.com

XGBoost (Part 1): Machine Learning Interview Prep 16

Xtreme Gradient Boosting, XGBoost, is like a wizard in the realm of computer learning. It’s like a clever craftsman…

kawsar34.medium.com

Confusion Matrix: Machine Learning Interview Prep 13

Principal Component Analysis (PCA) Part 1: Machine Learning Interview Prep 14

Principal Component Analysis (PCA) squeezes lots of information into fewer pieces, sort of like squishing a big…

K-Means Clustering: Machine Learning Interview Prep 12

Let’s check your basic knowledge of K-Means Clustering. Here are 10 multiple-choice questions for you and there’s no…

Principal Component Analysis (PCA) Part 2: Machine Learning Interview Prep 15

Principal Component Analysis (PCA) is a method used to simplify complex data by reducing its dimensions. It works by…

XGBoost (Part 1): Machine Learning Interview Prep 16

Xtreme Gradient Boosting, XGBoost, is like a wizard in the realm of computer learning. It’s like a clever craftsman…

Written by Shahidullah Kawsar

Responses (1)