Voice Activity DetectionVAD
Definition
The process of detecting the presence or absence of human speech in an audio signal.
Voice activity detection determines which portions of an audio signal contain speech and which contain silence, music, or background noise. VAD is a critical preprocessing step in speech recognition pipelines — it prevents the ASR model from wasting computation on non-speech segments and reduces hallucinated transcriptions from background noise.
Modern VAD systems use small neural networks that can run in real-time with minimal computational cost. Silero VAD, WebRTC VAD, and the VAD built into Apple's Speech framework are popular choices. Accurate VAD is especially important for always-on voice interfaces and real-time transcription systems.