Inference
Definition
The process of using a trained model to generate predictions or outputs from new input data.
Inference is the production use of a machine learning model — feeding it new, unseen input and getting output. In ASR, inference means passing audio through the trained model to get a transcript. In text refinement, it means sending raw text through a language model to get polished output.
Inference performance is measured by latency (time to first output), throughput (outputs per second), and compute cost. On-device inference runs the model locally on the user's hardware, while cloud inference sends data to remote servers. Ummless uses on-device inference for speech recognition and cloud inference for AI text refinement.