Speaker Diarization

Definition

The process of identifying and segmenting audio by speaker — determining who spoke when.

Speaker diarization answers the question 'who spoke when?' in a multi-speaker audio recording. It involves detecting speaker changes, clustering speech segments by speaker identity, and labeling each segment. This is distinct from speaker identification (determining who a specific speaker is) and speech recognition (determining what was said).

Modern diarization systems use neural speaker embeddings (like x-vectors or ECAPA-TDNN) combined with clustering algorithms. Diarization is essential for meeting transcription, interview processing, and any scenario where multiple speakers need to be distinguished in the output transcript.

Speaker Diarization

Related Terms

Related Content

Speaker Diarization Explained