Phoneme
Definition
The smallest unit of sound that distinguishes one word from another in a language.
Phonemes are the fundamental building blocks of spoken language. English has approximately 44 phonemes — for example, the words 'bat' and 'pat' differ by a single phoneme (/b/ vs /p/). Traditional ASR systems explicitly modeled phonemes as intermediate representations between audio and text.
Modern end-to-end models often bypass explicit phoneme modeling, working directly with characters or subword tokens. However, understanding phonemes remains important for tasks like pronunciation modeling, text-to-speech synthesis, and analyzing ASR errors.