Which activation function is most commonly used in hidden layers of modern deep neural networks to mitigate the vanishing gradient problem?