Privacy: where your code actually goes
With a cloud model, your prompt, your surrounding code, and often whole files leave your machine and reach a third party. For a side project that's a non-issue; for a client codebase under NDA, a regulated industry, or anything you simply don't want leaving the building, it's a real constraint that no privacy policy fully removes.
A local model removes the question. The weights sit on your disk, inference runs on your CPU or GPU, and nothing about your code touches a network unless you choose to send it. oi talks to a local runtime on localhost, so the privacy property is structural, not a setting you trust someone to honour.
Cost: per-token meter vs flat hardware
Cloud models bill per token, input and output. For light use that's pennies; for a team running agents, autocomplete, and chat all day it adds up to a recurring bill that scales with how much you code. A local model has no per-token cost at all — you've already paid for the machine, and inference is just electricity and time.
The tradeoff is upfront: a usable local coding model wants a reasonably modern machine, ideally with a decent GPU or an Apple Silicon Mac with enough unified memory. Once that's there, every completion is free, which changes how freely you reach for the model.
Control and offline: pin the model, work on a plane
Cloud endpoints change under you — a model gets deprecated, re-tuned, or rate-limited, and your tooling shifts without your say. A local model is a file you control: pull a version, pin it, and it behaves the same next month. You can also work fully offline, on a train or a plane or an air-gapped network, which a cloud model can't do at all.
Where cloud still wins is raw capability on the hardest problems — long multi-file reasoning, obscure domains, the genuinely tricky bug. The honest split is to use a local model for the high-volume everyday work and keep a cloud model for the occasional hard call. oi is built for that everyday half.
Local coding LLM vs cloud (Claude / GPT)
| Local LLM (oi) | Cloud (Claude / GPT) | |
|---|---|---|
| Where your code goes | Stays on your machine | Sent to the vendor |
| Cost model | Free per use, pay for hardware | Per-token billing |
| Offline | Works with no network | Requires connectivity |
| Model control | Pin and keep the exact version | Changes under you |
| Raw capability on hard problems | Good, model-dependent | Strongest |
| Setup | Install a runtime + pull a model | Add an API key |
frequently asked
Can a local model really replace Claude or GPT for coding?
For a lot of everyday work — autocomplete, refactors, boilerplate, explaining code, routine fixes — yes, recent open coding models are good enough. For the hardest multi-file reasoning, cloud models are still stronger. The sensible setup is local for the bulk, cloud for the occasional hard call.
Is it actually cheaper?
After the hardware, there's no per-token cost, so heavy daily use is where local pulls ahead. Light, occasional use can be cheaper on a pay-per-token cloud plan. The more you code, the more the flat-cost local option wins.
What hardware do I need?
Smaller coding models (around 7B parameters) run on a modern laptop, and run well on Apple Silicon with enough unified memory or on a machine with a discrete GPU. Bigger models want more VRAM. oi leaves the model choice to you so you can match it to your hardware.
Does oi host the model?
No. oi is the CLI and VS Code layer that connects your editor and terminal to a local runtime (Ollama or llama.cpp) running an open model you've pulled. The inference is local; oi never proxies your code anywhere.
Last updated June 19, 2026