What it is
A language model generates text one token at a time, each drawn from a probability distribution over its whole vocabulary. Grammar-constrained decoding inserts a step between that distribution and the sampling: it consults your grammar, works out which tokens could legally come next given what's been generated so far, and masks out every token that would break the language. The model then samples only from what's left.
Do that at every step and the running output can never leave the grammar, because there's never a moment where an illegal token is available to pick. When generation finishes, the result is guaranteed to be a valid string in your language. This is the mechanism behind GBNF grammars in llama.cpp and libraries like Outlines and XGrammar.
Why the output is valid by construction
'Valid by construction' means the validity isn't checked after the fact — it's a property of how the output was built. At no point during generation could the model have produced something invalid, so there's no failure mode to catch. Compare that with prompting ('please output valid syntax') or fine-tuning, both of which only shift probabilities and leave a nonzero chance of a token that breaks the language.
The constraint only ever removes illegal options; it never picks for the model. Among the tokens the grammar allows, the model chooses freely, so the content is still entirely the model's — it's just always well-formed. Syntax is guaranteed; meaning is still the model's job, which is why for hard semantic cases you might add retrieval or, optionally, fine-tune on top.
How dsl.ai uses it
dsl.ai is the part that turns your grammar into that constraint. You paste your DSL's EBNF/GBNF-style grammar into the browser playground — no account, no GPU, no training set — and it compiles the grammar into the decoding mask and, from the same grammar, a deterministic validator. The exact grammar you test drops into a hosted open model in production, so what you prove in the playground is what runs.
Three ways to get valid output from an LLM
| Approach | What it does | Validity |
|---|---|---|
| Prompting | Asks the model to follow the rules | Likely, never guaranteed |
| Fine-tuning | Shifts probabilities toward examples | More likely, never certain |
| Grammar-constrained decoding | Masks illegal tokens each step | Valid by construction |
frequently asked
- Does grammar-constrained decoding hurt output quality?
- No. It only removes tokens that would break the grammar; the model still chooses freely among every valid option, so quality of meaning is unaffected while syntax becomes guaranteed.
- Does it guarantee the output is correct, or just valid?
- Just syntactically valid — it guarantees the output belongs to your language, not that it does the right thing. Semantics are still the model's responsibility; add validation rules or fine-tuning for hard semantic cases.
- What do I need to use it?
- A grammar for your language and an open model served through a runtime that accepts a GBNF-style grammar. dsl.ai compiles the grammar for you from a file you paste, with no GPU or training data.
- How is it different from JSON mode?
- JSON mode is grammar-constrained decoding fixed to the JSON grammar. The general technique lets you supply any grammar, so the same guarantee applies to your own DSL or query language.
Last updated June 8, 2026