How to debug LLM agents effectively

Why agent bugs hide upstream

An agent run is a chain: the model reads context, decides on an action, calls a tool, reads the result, decides again. Each link depends on the one before it. When a link breaks — the model misreads a tool result, picks the wrong tool, or hallucinates an argument — every step after it is built on that mistake. So the symptom you notice (a wrong answer, an infinite loop, a tool error) is usually downstream of the real cause. Chasing the symptom leads you in circles; finding the first divergence leads you to the fix.

This is also why non-determinism makes agents so frustrating to debug. The same input can produce a different path on a second run, so a bug you saw once may not reproduce on demand. The defense is to capture the exact run that failed — its prompts, model replies, tool calls, and results — so you can study the real failure instead of trying to recreate it from memory.

Read the path, don't read the code

When an agent misbehaves, the instinct is to reread the agent's source. That's usually the wrong place to look first, because the code is fine — it's the data flowing through it that's wrong. The model got a prompt you didn't expect, or a tool returned something malformed, or the context window dropped a crucial earlier turn. None of that is visible in the source; it's only visible in the run.

So make the run readable. For each step you want the prompt that was sent, the model's raw reply (including any tool-call it requested), the tool that ran, its arguments, and what it returned. Lay those out in order and the failure usually becomes obvious — you can see the exact moment the agent acted on bad information. agentis condenses a run into exactly this ordered view, and lets you ask an LLM to point at the likely culprit step when the path is long.

how it works

01
Reproduce (or capture) the failing run
Don't debug from memory. Re-run the agent on the same input, or pull the logs of the run that actually failed. Because agents are non-deterministic, the run you study has to be a real one, not an approximation.
02
Read the execution path top to bottom
Walk the ordered sequence of prompts, model replies, tool calls, and results. Don't jump to the end — read from the start so you can see where it first went off course.
03
Isolate the first failing step
Find the earliest step where reality diverged from what you expected. Ignore later errors for now — they're usually consequences of this one. The first divergence is almost always the real bug.
04
Inspect that step's inputs and outputs
Look at exactly what went in (the prompt, the tool arguments, the context) and what came out (the model's reply, the tool result). The mismatch between what you assumed and what actually happened is your answer.
05
Make one change and re-run
Change a single thing — tighten the prompt, fix the tool, add the missing context — then re-run and read the path again. One change at a time keeps cause and effect clear instead of muddying the next run.

frequently asked

Why can't I just debug an agent with print statements?

Print statements show your local variables, but agent bugs live in the prompts, model replies, and tool results — the data flowing between steps. You need the actual conversation the agent had with the model and its tools, in order, not your code's internal state. That's what an execution path gives you and a print log doesn't.

The bug doesn't reproduce every time. How do I debug that?

Non-determinism means you can't rely on recreating the failure on demand. The fix is to capture every run so that when one fails, you already have its exact path saved. Then you debug the real failed run instead of trying to trigger it again. Capturing runs by default turns an intermittent bug into a readable record.

Should I look at the first error or the last one?

The first. Agent failures cascade — the model acts on a bad tool result, then everything after is built on that mistake. The last error you see is usually a symptom of an earlier one. Find the first step where the run diverged from what you expected, and you've almost always found the cause.

Do I need a special tool, or can I do this from raw logs?

You can do it from raw logs if they capture the prompts, model replies, and tool calls in order — most default logging doesn't. The work is reconstructing the ordered path from scattered lines. A tool like agentis ingests the logs and renders the run as a readable snapshot, which removes the reconstruction step.

Last updated June 19, 2026

How to debug LLM agents without guessing

Why agent bugs hide upstream

Read the path, don't read the code

how it works

Reproduce (or capture) the failing run

Read the execution path top to bottom

Isolate the first failing step

Inspect that step's inputs and outputs

Make one change and re-run

frequently asked

more on agentis

related across the studio

ready to try agentis?