What is the role of the attention mechanism in transformer-based deep learning models?
-
A
Regularize weights
-
B
Allow the model to weigh the relevance of each input token when producing an output
-
C
Reduce the number of parameters
-
D
Replace activation functions