Feature Extraction

Definition

The process of converting raw audio waveforms into numerical representations suitable for machine learning models.

Feature extraction transforms raw audio into compact, informative representations that highlight speech-relevant characteristics while discarding irrelevant variation. The most common features in modern ASR are mel-frequency cepstral coefficients (MFCCs) and log-mel spectrograms.

The process typically involves windowing the audio signal into short overlapping frames, applying a Fourier transform to convert from time to frequency domain, mapping frequencies to a perceptual scale (mel or bark), and optionally applying further transformations like discrete cosine transform or delta computation.

Feature Extraction

Related Terms

Related Content

Feature Extraction Mfccs Spectrograms