A data scientist is working on a binary classification problem to predict customer churn. The dataset is highly imbalanced. After training a Random Forest and a Gradient Boosting model, they find that the Gradient Boosting model has higher accuracy. Which of the following is the most likely reason for this outcome?
-
A
Random Forest builds trees in parallel, which is less effective on imbalanced data than the sequential approach of Gradient Boosting.
-
B
Gradient Boosting builds trees sequentially, with each tree focusing on correcting the errors of the previous one, which can be particularly effective for the minority class.
-
C
Random Forest is inherently a regression algorithm and is less suited for classification tasks.
-
D
Gradient Boosting is less prone to overfitting than Random Forest, especially on noisy datasets.