use case

The best local LLMs for coding, and how to run them with oi

the short answer

The most usable open coding models you can run locally today are Qwen2.5-Coder, DeepSeek-Coder, and Code Llama for code-specific work, with general models like Llama 3 as capable all-rounders — pick a size that fits your RAM/VRAM, and oi runs any of them through Ollama or llama.cpp in your editor.

Local coding has stopped being a toy. Several open-weights models are now genuinely useful for everyday programming, and they're free to download and run. The catch is choosing one that fits both your work and your hardware — a 7-billion-parameter model runs on a laptop, while the larger variants want a real GPU.

Code-specific models

Models trained specifically on code tend to punch above their size for programming. Qwen2.5-Coder (from Alibaba's Qwen team) is a strong recent pick, comes in several sizes from small enough for a laptop up to larger GPU builds, and has a long context window for working across files. DeepSeek-Coder is another well-regarded code-focused family with openly licensed weights. Code Llama, Meta's code-tuned Llama variant, is older but widely supported and a safe baseline.

These are the models to reach for when the work is autocomplete, refactors, writing tests, or explaining unfamiliar code — the high-volume tasks where a local model removes the per-token meter entirely.

General models that also code

You don't always need a code-specific model. General open models like Meta's Llama 3 are competent programmers and double as everything-else assistants, which is handy if you'd rather keep one model loaded. They trade a little code specialism for broader range.

The honest note: none of these match a frontier cloud model on the very hardest problems. They're chosen for being good enough on the everyday majority while running free and private on your own machine.

How oi fits

oi is model-agnostic by design. You pull whichever model above suits your hardware through Ollama or llama.cpp, point oi at the runtime, and use it from the CLI and the VS Code extension. Switching models is changing one setting, so you can try a few and keep the one that feels best on your machine.

Open coding models worth running locally

ModelTypical sizesStrength
Qwen2.5-CoderFrom laptop-size up to large GPU buildsStrong code model, long context
DeepSeek-CoderSmall to large variantsCode-focused, openly (MIT) licensed weights
Code Llama7B / 13B / 34BWidely supported code baseline
Llama 38B / 70BCapable general model that also codes

frequently asked

Which model should I start with?

A code-specific model around 7B, like a small Qwen2.5-Coder or DeepSeek-Coder build, is the easiest first step — it runs on most modern machines and is tuned for programming. Move up in size if your hardware allows and you want more capability.

Are these models actually free?

The weights are open and free to download. DeepSeek-Coder, for example, is MIT-licensed. Always check the specific license for your use, but for personal and most commercial coding use these are free to run.

Do code-specific models beat general ones?

For pure coding tasks, a code-tuned model often does more with less size. A general model like Llama 3 is the better pick if you want one model for both coding and everything else.

Can oi run all of these?

Yes. oi is model-agnostic — it runs whatever you've pulled into Ollama or llama.cpp, so any of these works and you can switch between them.

Last updated June 19, 2026

ready to try oi?

get oi