K-Nearest Neighbors (KNN): Machine Learning Interview Prep 11

Shahidullah Kawsar
7 min readNov 9, 2023

--

KNN, or K-Nearest Neighbors, is a simple yet effective algorithm used in machine learning. It works by finding the ‘k’ nearest data points to a new input based on a similarity measure, often Euclidean distance. Then, it predicts the class or value of the new input based on the most common class among its nearest neighbors (for classification) or the average value (for regression). KNN is easy to understand and implement, making it a popular choice for beginners in machine learning.

Photo: Fossil Discovery Center, Big Bend National Park, TX, USA Credit: Tasnim and Kawsar

Let’s check your basic knowledge of K-Nearest Neighbors (KNN). Here are 10 multiple-choice questions for you and there’s no time limit. Have fun!

Question 1: What is the KNN algorithm?
(A) The KNN algorithm is non-parametric and does not make assumptions about the underlying distribution of the data.
(B) The KNN works by finding the K closest data points (neighbors) to the query point and predicts the output based on the labels of these neighbors.
(C) The KNN algorithm is a lazy machine learning algorithm for classification and regression tasks. It can work well with both binary and multi-class classification problems.
(D) All of the above

Question 2: Euclidean and Minkowski distance are the most commonly used distance metrics in the KNN algorithm. What are the other distance metrics used in the KNN algorithm?
(A) Cosine distance
(B) Haversine distance
(C) Manhattan distance
(D) All of the above

Question 3: What are the disadvantages of using the KNN algorithm?
(A) As the number of dimensions increases, the distance between any two points in the space becomes increasingly large, making it difficult to find meaningful nearest neighbors.
(B) Computationally expensive, especially for large datasets, and requires a large amount of memory to store the entire dataset.
(C) Sensitive to the choice of K and distance metric.
(D) All of the above

Question 4: How do you choose the value of K (the number of neighbors to consider) in the KNN algorithm? (Select two)
(A) A small value of K, for example, K=1, will result in a more flexible model but may be prone to overfitting.
(B) A large value of K, for example, K=n, where n is the size of the dataset, will result in a more stable model but may not capture the local variations in the data.
(C) A large value of K, for example, K=n, where n is the size of the dataset, will result in a more flexible model but may be prone to overfitting.
(D) A small value of K, for example, K=1, will result in a more stable model but may not capture the local variations in the data.

Question 5: How do you handle imbalanced data in the KNN algorithm?
(A) Weighted voting, where the vote of each neighbor is weighted by its inverse distance to the query point. This gives more weight to the closer neighbors and less weight to the farther neighbors, which can help to reduce the effect of the majority class.
(B) Oversample the minority class.
(C) Undersample the majority class.
(D) All of the above.

Source: 9 Distance Measures in Data Science

Question 6: How would you choose the distance metric in KNN?
(A) Euclidean distance is a good default choice for continuous data. It works well when the data is dense and the differences between features are important.
(B) Manhattan distance is a good choice when the data has many outliers or when the scale of the features is different. For example, if we are comparing distances between two cities, the distance metric should not be affected by the difference in elevation or terrain between the cities.
(C) Minkowski distance with p=1 is equivalent to Manhattan distance, and Minkowski distance with p=2 is equivalent to Euclidean distance. Minkowski distance allows you to control the order of the distance metric based on the nature of the problem.
(D) All of the above

Question 7: What are the ideal use cases for KNN?
(A) KNN is best suited for small to medium-sized datasets with relatively low dimensionality. It can be useful in situations where the decision boundary is linear. It can be effective in cases where the data is clustered or has distinct groups.
(B) KNN is best suited for large datasets with relatively high dimensionality. It can be useful when the decision boundary is highly irregular or nonlinear. It can be effective in cases where the data is clustered or has distinct groups.
(C) KNN is best suited for small to medium-sized datasets with relatively low dimensionality. It can be useful when the decision boundary is highly irregular or nonlinear. It can be effective in cases where the data is clustered or has distinct groups.
(D) KNN is best suited for small to medium-sized datasets with relatively low dimensionality. It can be useful when the decision boundary is highly irregular or nonlinear. It can be effective in cases where the data is not clustered or doesn’t have distinct groups.

Question 8: How does the KNN algorithm work? (Select two)
(A) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For regression, the most common class among the ‘k’ neighbors is assigned as the predicted class for the new data point.
(B) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For classification, averages the values of the most common class among the ‘k’ neighbor to the target data point.
(C) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For classification, the most common class among the ‘k’ neighbors is assigned as the predicted class for the new data point.
(D) KNN works by calculating the distance between a data point and all other points in the dataset. Then, KNN selects the k-nearest neighbors. For regression tasks, instead of a majority vote, the algorithm takes the average of the ‘k’ nearest neighbors’ values as the prediction.

Question 9: What’s the bias and variance trade-off for KNN? (Select two)
(A) A small ‘k’ results in a low bias but high variance (the model is sensitive to noise).
(B) A large ‘k’ results in a low bias but high variance (the model is sensitive to noise).
(C) A large ‘k’ leads to high bias but low variance (smoothing over the data).
(D) A small ‘k’ leads to high bias but low variance (smoothing over the data).

Question 10: Which options are correct about instance-based learning, model-based learning, and online learning? (Select two)
(A) KNN is an instance-based learning algorithm, meaning it memorizes the entire training dataset and makes predictions based on similarity to instances. That’s why KNN is not naturally suited for online learning because it memorizes the entire training dataset. When new data is added, the entire model needs to be recalculated.
(B) Model-based learning involves learning a mapping from inputs to outputs and generalizing to new, unseen data. For example, SVM, Decision Trees, etc.
(C) KNN is a model-based learning algorithm, meaning it memorizes the entire training dataset and makes predictions based on similarity to instances. That’s why KNN is not naturally suited for online learning because it memorizes the entire training dataset. When new data is added, the entire model needs to be recalculated.
(D) Instance-based learning involves learning a mapping from inputs to outputs and generalizing to new, unseen data. For example, SVM, Decision Trees, etc.

The solutions will be published in the next K-Means Clustering: Machine Learning Interview Prep 12.

Happy learning. If you like the questions and enjoy taking the test, please subscribe to my email list for the latest ML questions, follow my Medium profile, and leave a clap for me. Feel free to discuss your thoughts on these questions in the comment section. Don’t forget to share the quiz link with your friends or LinkedIn connections. If you want to connect with me on LinkedIn: my LinkedIn profile.

The solution of Random Forest: Machine Learning Interview Prep 101(D), 2(D), 3(D), 4(A, B), 5(D), 6(D), 7(D), 8(D), 9(C, D), 10(D).

References:
[1] StatQuest: K-nearest neighbors, Clearly Explained
[2] Interview Questions for KNN
[3] Data Scientists’ Interview Guide: k-Nearest Neighbor
[4] Data Science Interview Questions on related to the K-Nearest Neighbors (KNN).
[5] sklearn KNN Classifier
[6] sklearn KNN Regressor

--

--

Shahidullah Kawsar
Shahidullah Kawsar

Written by Shahidullah Kawsar

Senior Data Scientist, IDARE, Houston, TX

Responses (2)