how to

How to put a human in the loop for an AI agent's destructive actions

the short answer

To add human approval to an AI agent, put a checkpoint between the agent and the systems it touches: route the agent's outbound HTTP through agent·shield, write policies that hold destructive actions (DELETEs, DROP TABLE, rm -rf, kubectl delete) instead of forwarding them, and review the held requests in an approval queue where a person approves or denies each one before it reaches the target — no SDK and no agent rewrite required.

"Keep a human in the loop" is the standard advice for autonomous agents, and it's right — but it's usually left as a slogan. In practice it means deciding which actions need a human, intercepting exactly those before they execute, and giving a person a fast way to say yes or no. The hard part is doing that without gutting the agent's autonomy or hand-coding approval checks into every tool.

This guide is the concrete version: how to wire human approval onto an agent you already have, so the safe 99% of its work runs untouched and only the genuinely destructive actions stop and wait for a person. The approach uses a proxy rather than agent surgery, which is what keeps it from becoming a refactor.

approve / denyevery held action waits in a queue for one of two human decisions before it's ever forwarded

Decide what actually needs a human

Human-in-the-loop fails when everything needs a human — approval fatigue sets in, people rubber-stamp, and the gate becomes theatre. So the first job is drawing the line: which actions are high-blast-radius enough to be worth a person's attention. The usual set is small and obvious — anything that deletes, drops, truncates, mass-updates, escalates privileges, or moves money or secrets.

Everything else — reads, routine writes, idempotent calls — should pass without a human ever seeing it. The goal is a queue that's mostly empty, so when something does land in it, it gets real scrutiny instead of a reflexive approve. agent·shield's policies are built for exactly this split: forward by default, hold the specific patterns that matter.

Intercept without rewriting the agent

The naive way to add approval is to edit every tool: wrap each destructive function in an "are you sure" check. That's slow, easy to miss a path, and it couples your safety logic to your agent code. The cleaner way is to intercept at the network layer, where every tool call eventually becomes an HTTP request.

agent·shield runs as a transparent HTTP proxy. You point the agent's base URL at it, and from then on it sees every request the agent makes and matches each against your hold policies — regex over method, path, and body. A matching request is held; a non-matching one is forwarded instantly. The agent doesn't know or care that the proxy is there, which is why there's no SDK to adopt and nothing in the agent to change.

Review, decide, and keep the record

A held request lands in the approval queue with everything a reviewer needs to judge it: the method, the path, the body, and the policy it tripped. A person approves it — and only then is it forwarded to the real target — or denies it, and it never executes. While a request is held, the destructive action simply has not happened yet; the hold is the safety.

Every step of this is logged to an append-only audit trail: the request, the matched policy, who approved or denied it, and when. That turns "a human reviewed it" from a claim into a record you can show an auditor, a customer, or your own team after an incident. The same setup also lets you tighten over time — start by holding a broad set of actions, watch the queue, and relax the policies that prove safe.

how it works

  1. 01

    List the actions that need a human

    Pick the high-blast-radius ones: deletes, DROP TABLE / TRUNCATE, rm -rf, kubectl delete, mass updates without a WHERE, privilege escalation, secrets and money movement. Keep the list short so the queue stays meaningful.

  2. 02

    Point the agent's traffic at agent·shield

    Set the agent's outbound base URL to the proxy. No SDK, no code changes — from here on every HTTP request the agent makes passes through the checkpoint.

  3. 03

    Write hold policies

    Express each risky action as a policy — regex over method, path, and body. Matching requests are held; everything else is forwarded instantly so the agent keeps its speed.

  4. 04

    Review the approval queue

    Held requests appear with full context — method, path, body, matched policy. A person approves (it's then forwarded) or denies (it never runs). The action stays paused until someone decides.

  5. 05

    Check the audit log and tune

    Every decision is recorded with actor and timestamp. Use the log to relax policies that always get approved and tighten ones that catch real mistakes, so the gate stays sharp.

frequently asked

Doesn't human approval defeat the point of an autonomous agent?
Only if you gate everything. agent·shield holds just the destructive actions and forwards the rest instantly, so the agent stays autonomous for reads and routine work — a person only steps in for the handful of calls that could actually cause damage.
How do I add approval without changing my agent's code?
Route the agent's outbound HTTP through agent·shield and write hold policies. Because it's a transparent proxy, the agent calls the proxy's URL exactly as it would call the real service — there's no SDK to install and no functions to wrap.
What happens to a request while it's waiting for approval?
It's held — paused at the proxy and not forwarded. The destructive action hasn't happened. When a person approves, agent·shield forwards it to the real target; if they deny, it never reaches the target at all.
Can I prove afterward that a human actually approved something?
Yes. Every held request and its decision — approved or denied, by whom, when, against which policy — is written to an append-only audit log, so human-in-the-loop is a record you can produce, not just a policy you assert.

Published May 18, 2026 · Last updated June 13, 2026

ready to try agent·shield?

open agent·shield