Real-Time FactorRTF

Definition

The ratio of processing time to audio duration, indicating how fast an ASR system transcribes speech.

Real-Time Factor measures the speed of a speech recognition system. An RTF of 1.0 means the system takes exactly as long to process audio as the audio's duration — one second of processing for one second of audio. An RTF below 1.0 means the system is faster than real-time; above 1.0 means it is slower.

RTF is critical for real-time applications like live captioning and voice assistants, where the system must keep up with the speaker. On-device models targeting real-time use typically aim for RTF of 0.1-0.3. Batch processing systems used for transcribing recorded audio can tolerate higher RTFs in exchange for greater accuracy.

Related Terms

Related Content