MS-DS Master of Data science Supervised Learning Algorithms Questions and Answers

Question 1

A data science team is developing a model to predict customer churn. The lead data scientist is concerned about overfitting, as the initial Decision Tree model is achieving 99% accuracy on the training data but only 75% on the test data. They also want a model that is robust to noise and provides feature importance rankings. Which of the following algorithms would be the most appropriate next choice to address these specific concerns?

Accepted Answer

Random Forest

Answer

Random Forest is an ensemble learning method that builds multiple decision trees and merges their predictions. This approach directly addresses the overfitting issue seen with a single Decision Tree by averaging the results, which reduces variance. It is also inherently robust to noise and provides feature importance scores based on how much each feature contributes to reducing impurity across all the trees in the forest. [19, 10]

Question 2

You are tasked with building a classification model for a dataset that is not linearly separable in its original feature space. You want to use a powerful classification algorithm that can find a non-linear decision boundary. Which algorithm and concept combination is specifically designed for this purpose?

Accepted Answer

Support Vector Machine (SVM) with the Kernel Trick

Answer

The Kernel Trick is a core concept in Support Vector Machines (SVMs) that allows the algorithm to handle non-linearly separable data. It works by implicitly mapping the data into a higher-dimensional space where a linear separator can be found. This avoids the computationally expensive process of explicitly transforming the data, enabling SVMs to create complex, non-linear decision boundaries efficiently. [24, 27, 29]

Question 3

A team is building a predictive model and wants to use an ensemble method. They decide on an algorithm that builds trees sequentially, where each new tree is trained to correct the errors of the previous ones. This method is known for its high predictive accuracy but can be prone to overfitting if not carefully tuned. Which algorithm are they using?

Accepted Answer

Gradient Boosting

Answer

Gradient Boosting is an ensemble technique that builds models sequentially. Each new model (typically a decision tree) is fit on the residual errors of the previous model. [14] This sequential, error-correcting process allows the model to achieve very high accuracy. However, because it focuses so intently on the errors, it can overfit the training data if the number of trees is too high or other hyperparameters are not properly regularized. [6, 10]

Question 4

In the context of regularized linear models, which of the following statements best describes the primary effect of L1 regularization (Lasso)?

Accepted Answer

It can shrink the coefficients of less important features to exactly zero, performing automatic feature selection.

Answer

L1 regularization, also known as Lasso regression, adds a penalty term to the loss function equal to the absolute value of the coefficients. A key property of this penalty is that it can force the coefficients of features that are not very predictive to become exactly zero. [9, 17] This makes L1 regularization a powerful tool for automatic feature selection and creating sparse, more interpretable models. [23, 25]

Question 5

A data scientist is choosing between Random Forest and Gradient Boosting for a classification task on a large, noisy dataset. The project has a tight deadline, requiring a model that is relatively fast to train and less sensitive to hyperparameter tuning. Which algorithm is the better choice and why?

Accepted Answer

Random Forest, because its trees are built in parallel and it is generally less sensitive to hyperparameter changes.

Answer

Random Forest is the more suitable choice in this scenario. Its core design involves building many independent decision trees, a process that can be easily parallelized, leading to faster training times on multi-core systems. [10] It is also known to be more robust and less sensitive to hyperparameter tuning compared to Gradient Boosting, which often requires careful tuning to prevent overfitting. [6] While Gradient Boosting can sometimes achieve higher accuracy, the trade-off in training time and tuning complexity makes Random Forest better for this specific situation. [10]

MS-DS Master of Data science Practice Test

MS-DS Master of Data science Practice Test

MS-DS Master of Data science Supervised Learning Algorithms Questions and Answers