Mel-Frequency Cepstral CoefficientsMFCC
Definition
A compact representation of the short-term power spectrum of an audio signal, designed to approximate human auditory perception.
MFCCs are one of the most widely used audio features in speech processing. They are computed by taking the mel spectrogram, applying a logarithmic transformation, and then applying a discrete cosine transform (DCT) to decorrelate the mel filter bank energies.
The resulting coefficients capture the shape of the spectral envelope, which encodes information about the vocal tract configuration — essentially what phoneme is being produced. Typically 12-13 MFCCs are used per frame, sometimes augmented with delta and delta-delta coefficients to capture temporal dynamics. While log-mel spectrograms have become more popular in deep learning, MFCCs remain relevant in resource-constrained and traditional systems.