Attention Mechanism

Definition

A neural network component that allows models to focus on relevant parts of the input when producing each part of the output.

Attention mechanisms enable neural networks to dynamically weight different parts of the input sequence when generating each output element. In speech recognition, attention allows the decoder to focus on the relevant portion of the audio when producing each text token, rather than relying on a single fixed-length vector to represent the entire input.

Self-attention, the variant used in transformers, lets every position in a sequence attend to every other position. This is computed via query, key, and value projections. Multi-head attention runs multiple attention computations in parallel, allowing the model to capture different types of relationships simultaneously.

Related Terms

Related Content