token·flow
cut your llm token bill
token·flow turns a month of LLM API logs into a clear answer to one question: where is the money actually going? You export your usage as a CSV — OpenAI, Anthropic, or both — and token·flow aggregates it into spend over time, by model and by prompt, so the line items stop being a mystery.
Two patterns drive most wasted spend, and token·flow surfaces both. Your costliest prompts — the handful of oversized inputs or runaway outputs that quietly dominate the bill — and your repeated prompts, the identical or near-identical calls you pay full price for every time instead of caching. Each finding comes with a plain recommendation: compress this input, cache these requests, right-size this model.
The MVP stays deliberately small: analysis and human-actionable advice, not a proxy in your critical path or an auto-rewriter touching your prompts. You stay in control of the changes; token·flow just makes the expensive parts impossible to miss. Free while it finds its feet.
how it works
- 01
upload your logs
Export your OpenAI or Anthropic usage as a CSV and drop it into data sources.
- 02
see the breakdown
token·flow aggregates spend over time and ranks your costliest and most-repeated prompts.
- 03
apply the wins
Work the recommendations — compress, cache, right-size — and mark each one applied as you go.
token·flow guides
Ways to use token·flow, and how it compares.
- how toHow to reduce your OpenAI API costs without breaking your productA practical guide to cutting OpenAI API costs: right-size the model, cap max_tokens, use prompt caching, batch async calls, and trim context. Includes the cheapest wins first.
- use caseLLM cost optimization for startups: keep the AI bill from eating your runwayHow early-stage startups keep LLM token costs under control: right-size models, cache aggressively, set per-user budgets, and watch margins. Practical strategies before you hire an infra team.
- how toPrompt compression: send fewer tokens without losing the answerPractical prompt compression techniques: trim system prompts, remove few-shot bloat, summarize context, and drop redundant instructions — cut input tokens without hurting output quality.
- how toCaching LLM responses: stop paying twice for the same answerA simple guide to caching LLM responses: exact-match caches, semantic caching for near-identical prompts, and provider prompt caching. Cut token spend on repeated requests.
- use caseUnderstanding your LLM bill: what you're actually paying forA clear breakdown of your LLM bill: input vs output tokens, why output costs more, cached vs uncached pricing, and how to read your OpenAI or Anthropic usage export.
- use caseSigns your LLM usage is inefficient — and how to fix each oneCommon signs of inefficient LLM usage: a frontier model doing simple tasks, repeated identical prompts, runaway output, and bloated context — with the fix for each.