use case

Understanding your LLM bill: what you're actually paying for

the short answer

Your LLM bill is token usage times a per-token price that differs by model and by direction — input (prompt) tokens are cheaper than output (completion) tokens, cached prompt tokens are cheaper still, and the total is dominated by whichever model and direction carries your highest token volume, which is usually output on chat-style workloads.

An LLM bill looks opaque until you know the three things that drive it: how many tokens you used, in which direction (input vs output), and on which model. Multiply token count by the per-token price for that model and direction, sum across every call, and that's your bill. Everything else is detail.

Understanding this breakdown is what turns 'the AI bill went up' into 'this endpoint's output tokens tripled.' This page explains how to read your usage so the number stops being a mystery. If you'd rather not parse the raw export by hand, upload a usage CSV from OpenAI or Anthropic to token·flow and it groups the spend for you by prompt, model, and direction.

2-4xthe typical per-token premium on output (completion) tokens versus input tokensSource: token·flow usage analysis

Input tokens, output tokens, and why the split matters

Every call has two token counts. Input (prompt) tokens are everything you send: the system prompt, the conversation history, the retrieved context, the user's message. Output (completion) tokens are what the model generates back. Providers price these separately, and output is almost always the more expensive direction — often two to four times the input rate. That's why a chat feature that returns long answers can cost far more than its input size suggests.

Cached tokens add a third line. When provider prompt caching kicks in, the repeated prefix of your input is billed at a reduced cached rate rather than the full input rate. So a single call can have full-price input tokens, cheaper cached input tokens, and full-price output tokens all at once. Reading your bill means seeing all three, not just a single total.

How to read your usage export

Both OpenAI and Anthropic let you export usage. The mistake is sorting by call count — a low-volume endpoint with enormous prompts or long completions can quietly outspend a high-volume one with tiny calls. Sort by total cost, or by total tokens weighted toward the output direction, and the real culprits surface immediately.

Look for three patterns. One: a single endpoint or prompt dominating the total — that's your first target. Two: output tokens far exceeding what the use case needs — a candidate for max_tokens caps and brevity instructions. Three: the same prompt appearing over and over — a caching candidate. token·flow runs exactly this grouping on an uploaded CSV, so you see the costliest prompts and the repeats without writing a query.

frequently asked

Why is my output (completion) cost higher than my input cost?
Because output tokens are priced higher per token on most models — frequently two to four times the input rate — and because chat-style features often generate long answers. If your bill is dominated by output, the fix is on the generation side: cap max_tokens, ask for concise answers, and use structured outputs to avoid retries.
What are cached tokens on my bill?
When a long prompt prefix is identical across calls, OpenAI and Anthropic can reuse internal computation and bill that repeated portion at a lower cached rate. Cached input tokens show up cheaper than full input tokens. Keeping your stable prefix unchanged and at the front of the prompt is what triggers the discount.
How do I find which feature is driving my bill?
Don't sort by number of calls — sort by total cost or total tokens. A handful of endpoints with big prompts or long completions usually account for most of the spend. Export a usage CSV and group it by prompt and model; token·flow does this automatically and ranks the costliest first.
Does temperature or model speed affect the price?
No. Temperature, top-p, and latency don't change pricing — only token counts and the model's per-token rates do. A faster or slower response with the same token usage costs the same. The levers that move the bill are model choice, prompt size, and completion length.

Last updated June 15, 2026

ready to try token·flow?

analyze your usage