Local Inference

Definition

Running a machine learning model on the user's own hardware rather than on a remote cloud server.

Local inference is the execution of model predictions on the user's device — their laptop, desktop, or phone. This contrasts with cloud inference, where data is sent to a remote server for processing. Local inference offers privacy (data stays on-device), lower latency (no network round-trip), and offline capability.

The feasibility of local inference depends on model size, hardware capabilities, and acceptable accuracy. Apple's Neural Engine enables high-quality local speech recognition. For language models, smaller quantized models can run locally, though the largest LLMs still require cloud infrastructure. Ummless uses local inference for speech-to-text and cloud inference for AI text refinement, balancing privacy with capability.

Local Inference

Related Terms

Related Content

Local Vs Cloud Inference Tradeoffs