A financial services company needs to process massive volumes of historical transaction data for end-of-day reporting.
The processing involves complex aggregations and joins.
The key requirements are cost-effectiveness for handling petabyte-scale data and reliability, even with hardware failures.
Which technology is most suitable for this large-scale batch processing task?