Speech-to-TextSTT
Definition
Speech-to-text (STT) is the practical application of automatic speech recognition technology. While ASR refers to the broader field of research and development, STT typically refers to the user-facing capability of converting speech into text in real-time or batch processing.
Modern STT systems can run entirely on-device (like Apple's speech framework used by Ummless) or in the cloud. On-device processing offers privacy advantages and lower latency, while cloud-based systems can leverage larger models. STT accuracy has improved dramatically with transformer-based models, reaching near-human performance on many benchmarks.
Frequently Asked Questions
What is speech-to-text?
Speech-to-text is the technology that converts spoken audio into written text, enabling voice input for applications like dictation, transcription, and voice assistants.