Models and languages

Local vs cloud models

Local models run on your device and keep audio private; cloud models use your API key and are quick to set up. Here is the trade-off.

3 min read

Both kinds produce text from your voice, but they differ in where your audio goes and what you need to set up. This table sums it up.

Aspect	Local model	Cloud model
Privacy	Audio never leaves your device	Audio is sent to the provider
Setup	Download the model once	Paste an API key
Speed	Depends on your hardware	Consistent, handled server-side
Offline	Works with no connection	Needs an internet connection
Cost	Free forever	Billed by the provider to your key

When to use local

Choose a local model when privacy matters, when you work offline, or when you would rather not manage an API key. Everything happens on your machine, and there is nothing to pay. The only cost is the one-time model download and your computer doing the work.

When cloud makes sense

Choose a cloud model when you want top accuracy without a large download, when your machine is modest, or when you want the same speed regardless of hardware. You bring your own key, and the provider bills your usage. See Cloud API keys.

FAQ

Questions and answers

Can I mix local and cloud?

Yes. Because each mode sets its own model, you can have a private local mode for sensitive work and a cloud mode for speed, and switch between them.

Does a local model ever send my audio anywhere?

No. With a local model, transcription happens on your device and the audio is discarded there. Audio only leaves your computer if you choose a cloud provider.

Is cloud always more accurate?

Often, but not always. A large local Whisper model is very accurate too. Cloud mainly wins when you want top accuracy without downloading a big model.

Local vs cloud models

When to use local

When cloud makes sense

Questions and answers

Can I mix local and cloud?

Does a local model ever send my audio anywhere?

Is cloud always more accurate?

Related articles

Which model should I choose?

Cloud API keys

Transcription models

Start dictating in minutes