What is data skew in Apache Spark and why is it a problem?
-
A
Uneven data distribution across partitions causing some tasks to take much longer than others
-
B
An imbalanced ratio of cores to memory across executors
-
C
Incorrect sorting of data that causes inaccurate results
-
D
A bug in the Spark scheduler that misallocates tasks