Word Error RateWER
Definition
Word Error Rate is computed as (Substitutions + Insertions + Deletions) / Total Reference Words. A WER of 0% means perfect transcription; values above 100% are possible when the system produces many insertions. Human transcription error rates on clean speech are typically 4-5%, so WERs approaching this range indicate near-human performance.
WER has known limitations — it treats all word errors equally, so misrecognizing 'the' is penalized the same as misrecognizing a critical keyword. Despite this, WER remains the de facto standard for comparing ASR systems across benchmarks like LibriSpeech, Switchboard, and Common Voice.
Frequently Asked Questions
What is a good word error rate?
A WER below 5% is considered near-human accuracy for clean speech. For noisy or conversational speech, WERs of 10-15% are typical for state-of-the-art systems.