Linear Regression (Part 2): Machine Learning Interview Prep 08

Shahidullah Kawsar
5 min readOct 29, 2023

--

Linear regression is a basic yet powerful tool in machine learning used for predicting numerical outcomes. It works by fitting a straight line to a set of data points, allowing us to understand the relationship between variables. This method is often used in situations where there is a clear linear relationship between the input and output variables. By finding the best-fitting line, linear regression enables us to make predictions about future values based on the input data.

Let’s check your basic knowledge of Linear Regression. Here are 10 multiple-choice questions for you and there’s no time limit. Have fun!

Question 1: What are the assumptions of an Ordinary Least Square (OLS) model?
(A) The variance of the residuals must be constant. If there’s a funnel-like pattern on a residual plot, then the variance is non-constant.
(B) The relationship between predictors (X) and target (Y) must be linear.
(c ) The residuals of observations over time must be random, with no trends or patterns suggesting that there is no correlation between current observations and previous observations.
(D) All of the above

Question 2: What are the assumptions of an Ordinary Least Square (OLS) model?
(A) No collinearity in the predictors.
(B) No influential outlier that will shift the model away from the bulk of the data.
(C ) None of the above
(D) Both (A) and (B)

Question 3: The variance of the residuals must be constant. If there’s a funnel-like pattern on a residual plot, then the variance is non-constant. This assumption of an Ordinary Least Square (OLS) model is called ______?
(A) Homoscedasticity
(B) Linearity
(C ) Heteroscedasticity
(D) All of the above

Question 4: The residuals of observations over time must be random, with no trends or patterns suggesting that there is no correlation between current observations and previous observations. This assumption of an Ordinary Least Square (OLS) model is called ______?
(A) No Auto-correlation
(B) Auto-correlation
(C ) Multicollinearity
(D) No Multicollinearity

Question 5: What are the assumptions of an Ordinary Least Square (OLS) model?
(A) Homoscedasticity, Linearity, Auto-correlation, Multicollinearity, No outliers
(B) Homoscedasticity, Linearity, No Auto-correlation, No Multicollinearity, No outliers
(C ) Heteroscedasticity, Linearity, No Auto-correlation, No Multicollinearity, No outliers
(D) Homoscedasticity, No Linearity, No Auto-correlation, Multicollinearity, No outliers

Question 6: If linearity is violated, how can you determine it? (Select two)
(A) The residuals are away from the straight diagonal line in the QQ-norm.
(B) The residuals are close to the straight diagonal line in the QQ-norm.
(C ) Non-normal shape in the residual distribution plot.
(D) Normal shape in the residual distribution plot.

Question 7: Which statement is correct about the linearity violation?
(A) If the linearity is violated, this should be visible on a QQ-norm or residual distribution plot.
(B) The residuals away from a straight diagonal line in the QQ-norm and non-normal shape in the residual plot show that the linearity is violated.
(C ) If the linearity is violated, add a polynomial term or try different models in the GLM family (i.e. exponential) and other variants (i.e. piece-wise).
(D) All of the above

Question 8: Find the incorrect statement: How would you remove collinearity in a feature set of a thousand variables?
(A) A filter method such as using the correlation between each feature (X) and target (Y) will not remove the multicollinearity.
(B) Wrapper approach (backward/forward/stepwise selection).
(C ) Regularized regression model to derive a feature set with reduced collinearity.
(D) All of the above

Question 9: How does the violation of homoscedasticity affect the fit of the linear model? (select two)
(A) As the variance of the residuals increases, the model becomes more sensitive to such data points than the areas with less variance.
(B) The data points in the high variance region act as influential outliers.
(C ) As the variance of the residuals decreases, the model becomes more sensitive to such data points than the areas with less variance.
(D) The data points in the low variance region act as influential outliers.

Question 10: When the violation of homoscedasticity affects the fit of the linear model, how to reduce the influence of this violation?
(A) Apply a transformation on the target variable (i.e. square root or log)
(B) Use weighted least squares
(C ) Both (A) and (B)
(D) None of the above

The solutions will be published in the next quiz Regularization: Machine Learning Interview Prep 09.

Happy learning. If you like the questions and enjoy taking the test, please subscribe to my email list for the latest ML questions, follow my Medium profile, and leave a clap for me. Feel free to discuss your thoughts on these questions in the comment section. Don’t forget to share the quiz link with your friends or LinkedIn connections. If you want to connect with me on LinkedIn: my LinkedIn profile.

The solution of Decision Tree (Part 3): Machine Learning Interview Prep 07: 1(A), 2(B), 3(A), 4(D), 5(A, B), 6(A, C), 7(C, D), 8(A), 9(D), 10(B)

References:

[1] Applied Statistics Question Bank: datainterview.com, https://courses.datainterview.com/courses/take/stats-question-bank/texts/24506879-what-are-the-assumptions-of-an-ols-model

[2] Quantile-Quantile Plots (QQ plots), Clearly Explained!!! https://www.youtube.com/watch?v=okjYjClSjOg&t=3s

[3] Normality, https://sscc.wisc.edu/sscc/pubs/RegDiag-R/normality.html

[4] Residual Plot, https://study.com/skill/learn/how-to-interpret-a-residual-plot-explanation.html#:~:text=Residual%20Plot%3A%20A%20residual%20plot,in%20modeling%20the%20given%20data.

[5] Weighted least squares, https://www.statisticshowto.com/weighted-least-squares/

Photo: Rio Grande Village Nature Trail, Big Bend National Park, TX, USA Credit: Tasnim and Kawsar

--

--