How to add human approval to an AI agent's risky actions

Decide what actually needs a human

Human-in-the-loop fails when everything needs a human — approval fatigue sets in, people rubber-stamp, and the gate becomes theatre. So the first job is drawing the line: which actions are high-blast-radius enough to be worth a person's attention. With agent·shield the line is a set of regex patterns over method, path, and body, so "needs a human" is a concrete list — a DELETE method, DROP TABLE or TRUNCATE in the body, rm -rf, kubectl delete, an UPDATE with no WHERE clause, a path that touches secrets or billing — not a vibe.

Everything else — reads, routine writes, idempotent calls — should pass without a human ever seeing it. The goal is a queue that's mostly empty, so when something does land in it, it gets real scrutiny instead of a reflexive approve. agent·shield's policies are built for exactly this split: forward by default, hold the specific patterns that matter.

Intercept without rewriting the agent

The naive way to add approval is to edit every tool: wrap each destructive function in an "are you sure" check. That's slow, easy to miss a path, and it couples your safety logic to your agent code. The cleaner way is to intercept at the network layer, where every tool call eventually becomes an HTTP request.

agent·shield runs as a transparent HTTP proxy. You point the agent's base URL at it, and from then on it sees every request the agent makes and matches each against your hold policies — regex over method, path, and body. A matching request is held; a non-matching one is forwarded instantly. The agent doesn't know or care that the proxy is there, which is why there's no SDK to adopt and nothing in the agent to change.

Review, decide, and keep the record

A held request lands in the approval queue with everything a reviewer needs to judge it: the method, the path, the body, and the policy it tripped. A person approves it — and only then is it forwarded to the real target — or denies it, and it never executes. While a request is held, the destructive action simply has not happened yet; the hold is the safety.

Every step of this is logged to an append-only audit trail: the request, the matched policy, who approved or denied it, and when. That turns "a human reviewed it" from a claim into a record you can show an auditor, a customer, or your own team after an incident. The same setup also lets you tighten over time — start by holding a broad set of actions, watch the queue, and relax the policies that prove safe.

how it works

01
List the actions that need a human
Pick the high-blast-radius ones: deletes, DROP TABLE / TRUNCATE, rm -rf, kubectl delete, mass updates without a WHERE, privilege escalation, secrets and money movement. Keep the list short so the queue stays meaningful.
02
Point the agent's traffic at agent·shield
Set the agent's outbound base URL to the proxy. No SDK, no code changes — from here on every HTTP request the agent makes passes through the checkpoint.
03
Write hold policies
Express each risky action as a policy — regex over method, path, and body. Matching requests are held; everything else is forwarded instantly so the agent keeps its speed.
04
Review the approval queue
Held requests appear with full context — method, path, body, matched policy. A person approves (it's then forwarded) or denies (it never runs). The action stays paused until someone decides.
05
Check the audit log and tune
Every decision is recorded with actor and timestamp. Use the log to relax policies that always get approved and tighten ones that catch real mistakes, so the gate stays sharp.

frequently asked

Doesn't human approval defeat the point of an autonomous agent?

Only if you gate everything. agent·shield holds just the destructive actions and forwards the rest instantly, so the agent stays autonomous for reads and routine work — a person only steps in for the handful of calls that could actually cause damage.

How do I add approval without changing my agent's code?

Route the agent's outbound HTTP through agent·shield and write hold policies. Because it's a transparent proxy, the agent calls the proxy's URL exactly as it would call the real service — there's no SDK to install and no functions to wrap.

What happens to a request while it's waiting for approval?

It's held — paused at the proxy and not forwarded. The destructive action hasn't happened. When a person approves, agent·shield forwards it to the real target; if they deny, it never reaches the target at all.

Can I prove afterward that a human actually approved something?

Yes. Every held request and its decision — approved or denied, by whom, when, against which policy — is written to an append-only audit log, so human-in-the-loop is a record you can produce, not just a policy you assert.

Published May 18, 2026 · Last updated June 13, 2026

How to put a human in the loop for an AI agent's destructive actions

Decide what actually needs a human

Intercept without rewriting the agent

Review, decide, and keep the record

how it works

List the actions that need a human

Point the agent's traffic at agent·shield

Write hold policies

Review the approval queue

Check the audit log and tune

frequently asked

more on agent·shield

related across the studio

ready to try agent·shield?