Connectionist Temporal ClassificationCTC

Definition

A loss function that allows training sequence-to-sequence models without requiring pre-aligned input-output pairs.

Connectionist Temporal Classification is a training objective designed for problems where the input and output sequences have different lengths and no explicit alignment is available. CTC introduces a blank token and marginalizes over all possible alignments between input frames and output labels.

CTC is widely used in speech recognition models like wav2vec 2.0 and DeepSpeech. It simplifies training because you only need the transcript — not frame-level alignments. However, CTC assumes conditional independence between output tokens, which can limit accuracy compared to attention-based encoder-decoder models.

Frequently Asked Questions

Why is CTC important for speech recognition?

CTC eliminates the need for frame-level alignment between audio and text during training, making it much easier to train ASR models on large datasets.

Related Terms

Related Content