What is the vanishing gradient problem in deep neural networks?
-
A
Gradients become too large during backpropagation
-
B
Gradients become extremely small, making early layers learn very slowly
-
C
The loss function fails to converge
-
D
Weights are initialized to zero