What is the 'attention mechanism' in deep learning primarily used for?
-
A
Reducing the number of layers in a network
-
B
Allowing the model to focus on relevant parts of the input when producing each output
-
C
Replacing dropout for regularization
-
D
Scheduling the learning rate during training