Audio Preprocessing
Definition
The set of signal processing steps applied to raw audio before it is fed into a speech recognition model.
Audio preprocessing prepares raw audio for consumption by machine learning models. Common steps include resampling to a standard rate (typically 16kHz for speech), converting stereo to mono, normalizing amplitude levels, removing DC offset, and applying pre-emphasis filtering to boost high frequencies.
More advanced preprocessing includes noise reduction, echo cancellation, and automatic gain control. The quality of preprocessing directly affects ASR accuracy — clean, well-conditioned audio produces significantly better transcriptions. Ummless leverages the built-in audio preprocessing in Apple's Speech framework to optimize audio quality before recognition.