Acoustic Model

Definition

A model that maps audio features to phonemes or text tokens.

An acoustic model is the component of a speech recognition system responsible for learning the relationship between audio signal features (such as mel spectrograms) and linguistic units like phonemes, characters, or subword tokens.

Traditionally, acoustic models were built with Gaussian Mixture Models and Hidden Markov Models. Modern systems use deep neural networks — convolutional, recurrent, or transformer-based — that directly learn feature representations from raw or lightly processed audio. In end-to-end ASR systems, the acoustic model is fused with the language model into a single architecture.

Frequently Asked Questions

What does an acoustic model do?

It maps audio features like spectrograms to linguistic units such as phonemes or text tokens, forming the core of a speech recognition system.

Related Terms

Related Content