use case

Understanding AI agent context: a guide for prompt engineers

the short answer

An AI agent's context is everything the model can see at the moment it makes a decision — the system and user prompts, the running memory of earlier turns, the results of tools it has called, and its current state — and capturing that exact context per step is essential to debugging, because an agent only ever acts on what was in its context window, so a wrong action almost always traces back to wrong, missing, or stale context.

Prompt engineers tend to think about the prompt — the instructions and examples they wrote. But an agent acts on more than its prompt. At each step it sees a whole bundle: the system instructions, the conversation so far, whatever it remembers, and the results of tools it called along the way. That entire bundle is its context, and the model can only reason about what's in it. Anything outside the context window might as well not exist.

This matters enormously for debugging, because most agent failures aren't reasoning failures — they're context failures. The model did something sensible given what it could see; the problem was what it could see. A tool result got dropped, an earlier turn fell out of the window, a stale memory contradicted the new instruction. To debug an agent you have to be able to see the exact context of the step that went wrong. agentis captures that context per step so you're not guessing what the model knew.

get started →about agentis

agentis.ogbuilds.ai/agents/support-triage/debug

agentis

agentssnapshotsdebug

support-triage

step 5 · llm·callcontext at decision time

1,862 tokens

system prompt

you are a support agent. always escalate billing disputes to tier-2.

memory · 4 turns

turn 1: classify → billing. turn 2: load customer. turn 3: load invoices (failed).

tool result

billing.getInvoices → ETIMEDOUT after 3 retries (last: 11:54:08)

current state

step: draft reply · goal: resolve ticket #4825 · retries exhausted

model reply

billing api unreachable. sending holding reply and escalating to tier-2.

where this happens in the app

step 5 expanded: the four things in the model's context when it made its decision — the system prompt, the memory of earlier turns, the tool result that shaped it, and current state.

1step header: step 5, an llm·call, with the token count — 1,862 tokens were in context when this decision was made
2the tool result in context — billing.getInvoices timed out; this is the bad input the model had to reason from. context, not reasoning, explains the decision
3the model reply: sensible given its context — billing is unreachable, fall back to a holding reply and escalate. right decision from wrong inputs

The four things that make up an agent's context

First, the prompts: the system prompt that sets the agent's role and rules, plus the user input that started the run. These are the part you author directly. Second, memory: the running history of the conversation or task — earlier turns, prior decisions, summaries of what's happened. This grows as the run proceeds, and it's where things quietly fall out when the window fills up. Third, tool outputs: every result a tool returned is context for the next decision, and a malformed or misread result poisons everything downstream.

Fourth, state: the agent's sense of where it is in the task — what it's trying to do right now, what it's already done, what's left. Some agents track this explicitly; in others it's implicit in the conversation. All four travel together into each model call, and the model treats them as one undifferentiated view of the world. It can't tell a stale memory from a fresh fact, or a wrong tool result from a right one — it just reasons over whatever is there. That's why the contents of context, not just the prompt, decide what the agent does.

Why capturing context is the heart of debugging

When an agent makes a baffling decision, the first question is always: what did it actually see? Not what you think it saw — what was genuinely in its context at that moment. The answer explains the behavior almost every time. The model picked the wrong tool because an earlier result framed the problem differently. It repeated itself because the memory of having already done the task dropped out of the window. It hallucinated a value because the real one was never in context to begin with.

You can't answer 'what did it see?' from the agent's source code, because context is assembled at runtime from prompts, memory, and tool results that change with every run. You have to capture it. That means recording, per step, the full context that went into the model and the reply that came out. With that in hand, a confusing agent becomes legible: you read the context, you see why the decision made sense to the model, and you fix the context rather than blaming the model. agentis records this context alongside each step of the run so the 'what did it see?' question always has a concrete answer.

frequently asked

What's the difference between a prompt and an agent's context?

The prompt is what you author — the system instructions and the user input. Context is everything the model can see at decision time, which includes the prompt but also the running memory of earlier turns, the results of tools it called, and its current state. The model acts on the whole context, not just the prompt, which is why debugging means looking at all of it.

Why do so many agent bugs come down to context?

Because the model only reasons over what's in its context window. If a tool result was wrong, a memory went stale, or a crucial earlier turn dropped out, the model makes a sensible decision based on bad information. The reasoning isn't broken — the inputs were. Most baffling agent behavior resolves the moment you see the actual context the model had.

How does context fall out of the window?

Context windows are finite, so as a run accumulates turns and tool results, older content has to be dropped or summarized to fit. If something load-bearing — an instruction, an earlier result, the original goal — gets trimmed, the model loses it and starts acting as if it never existed. Capturing per-step context lets you see exactly when something important disappeared.

How do I capture an agent's context for debugging?

Record, for each model call, the full context that went in (prompt, memory, tool results, state) and the reply that came out, tagged to the run. Then you can read what the model actually saw at the failing step. agentis captures this per step from the logs you post, so the context behind each decision is always available rather than reconstructed from guesswork.

Last updated June 19, 2026

more on agentis

agentis — understand your ai agents. →

related across the studio

ready to try agentis?

get started →