Local vs cloud models
Local models run on your device and keep audio private; cloud models use your API key and are quick to set up. Here is the trade-off.
3 min read
Both kinds produce text from your voice, but they differ in where your audio goes and what you need to set up. This table sums it up.
| Aspect | Local model | Cloud model |
|---|---|---|
| Privacy | Audio never leaves your device | Audio is sent to the provider |
| Setup | Download the model once | Paste an API key |
| Speed | Depends on your hardware | Consistent, handled server-side |
| Offline | Works with no connection | Needs an internet connection |
| Cost | Free forever | Billed by the provider to your key |
When to use local
Choose a local model when privacy matters, when you work offline, or when you would rather not manage an API key. Everything happens on your machine, and there is nothing to pay. The only cost is the one-time model download and your computer doing the work.
When cloud makes sense
Choose a cloud model when you want top accuracy without a large download, when your machine is modest, or when you want the same speed regardless of hardware. You bring your own key, and the provider bills your usage. See Cloud API keys.
FAQ
Questions and answers
Can I mix local and cloud?
Yes. Because each mode sets its own model, you can have a private local mode for sensitive work and a cloud mode for speed, and switch between them.
Does a local model ever send my audio anywhere?
No. With a local model, transcription happens on your device and the audio is discarded there. Audio only leaves your computer if you choose a cloud provider.
Is cloud always more accurate?
Often, but not always. A large local Whisper model is very accurate too. Cloud mainly wins when you want top accuracy without downloading a big model.