Encoder

Definition

The component of an ASR model that processes audio input and produces hidden representations.

The encoder is the front half of an encoder-decoder speech recognition model. It takes audio features — typically a mel spectrogram — and transforms them through multiple layers of neural network processing into a sequence of high-dimensional hidden representations that capture both acoustic and linguistic information.

Modern encoders use transformer layers with self-attention, allowing every frame to attend to every other frame. This captures long-range dependencies in the audio signal. The encoder output is then consumed by the decoder or a CTC head to produce the final text.

Related Terms

Related Content