Speech-to-TextSTT

Definition

The conversion of spoken audio into written text, also known as automatic speech recognition.

Speech-to-text (STT) is the practical application of automatic speech recognition technology. While ASR refers to the broader field of research and development, STT typically refers to the user-facing capability of converting speech into text in real-time or batch processing.

Modern STT systems can run entirely on-device (like Apple's speech framework used by Ummless) or in the cloud. On-device processing offers privacy advantages and lower latency, while cloud-based systems can leverage larger models. STT accuracy has improved dramatically with transformer-based models, reaching near-human performance on many benchmarks.

Frequently Asked Questions

What is speech-to-text?

Speech-to-text is the technology that converts spoken audio into written text, enabling voice input for applications like dictation, transcription, and voice assistants.

Related Terms

Related Content