Decoder

Definition

The component of an ASR model that generates text output from encoded audio representations.

In encoder-decoder ASR architectures, the decoder takes the hidden representations produced by the encoder and generates the output text sequence one token at a time. At each step, the decoder attends to the encoder output and to previously generated tokens to predict the next token.

Transformer-based decoders use masked self-attention to prevent the model from seeing future tokens during training. The decoder's vocabulary, tokenization strategy, and generation parameters (beam width, temperature) all affect the quality and style of the final transcript.

Related Terms

Related Content