Which metric is most appropriate for evaluating a classification model on a highly imbalanced dataset?