A data scientist is preparing a dataset for a K-Nearest Neighbors (KNN) model. The dataset contains an 'age' feature (range 20-70) and an 'income' feature (range 30,000-250,000). Since KNN is a distance-based algorithm, what is the most appropriate feature scaling technique to apply and why?
-
A
One-Hot Encoding, because it converts features into a binary format that is easier to process.
-
B
Min-Max Scaling, because it preserves the original distribution of the data without distortion.
-
C
Standardization (Z-score scaling), because it is less sensitive to outliers than Min-Max scaling and handles features on vastly different scales effectively.
-
D
Log Transformation, because it reduces the right-skewness often found in income data.