Whisper Model

Definition

An open-source, multitask speech recognition model developed by OpenAI.

Whisper is a transformer-based encoder-decoder model trained on 680,000 hours of multilingual, weakly-supervised audio data scraped from the web. Released by OpenAI in 2022, it can perform multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.

Whisper's key innovation is its multitask training approach — the model learns all tasks simultaneously through special tokens that specify the desired output format. It is available in multiple sizes (tiny through large) trading accuracy for speed. Whisper has become a foundational model in the open-source speech recognition ecosystem, with community projects enabling efficient on-device inference.

Frequently Asked Questions

What is the Whisper model?

Whisper is an open-source speech recognition model by OpenAI trained on 680,000 hours of multilingual audio data, capable of transcription, translation, and language identification.

Can Whisper run on-device?

Yes, optimized versions of Whisper can run locally on consumer hardware using frameworks like whisper.cpp, enabling private, offline speech recognition.

Related Terms

Related Content