Decide what actually needs a human
Human-in-the-loop fails when everything needs a human — approval fatigue sets in, people rubber-stamp, and the gate becomes theatre. So the first job is drawing the line: which actions are high-blast-radius enough to be worth a person's attention. The usual set is small and obvious — anything that deletes, drops, truncates, mass-updates, escalates privileges, or moves money or secrets.
Everything else — reads, routine writes, idempotent calls — should pass without a human ever seeing it. The goal is a queue that's mostly empty, so when something does land in it, it gets real scrutiny instead of a reflexive approve. agent·shield's policies are built for exactly this split: forward by default, hold the specific patterns that matter.
Intercept without rewriting the agent
The naive way to add approval is to edit every tool: wrap each destructive function in an "are you sure" check. That's slow, easy to miss a path, and it couples your safety logic to your agent code. The cleaner way is to intercept at the network layer, where every tool call eventually becomes an HTTP request.
agent·shield runs as a transparent HTTP proxy. You point the agent's base URL at it, and from then on it sees every request the agent makes and matches each against your hold policies — regex over method, path, and body. A matching request is held; a non-matching one is forwarded instantly. The agent doesn't know or care that the proxy is there, which is why there's no SDK to adopt and nothing in the agent to change.
Review, decide, and keep the record
A held request lands in the approval queue with everything a reviewer needs to judge it: the method, the path, the body, and the policy it tripped. A person approves it — and only then is it forwarded to the real target — or denies it, and it never executes. While a request is held, the destructive action simply has not happened yet; the hold is the safety.
Every step of this is logged to an append-only audit trail: the request, the matched policy, who approved or denied it, and when. That turns "a human reviewed it" from a claim into a record you can show an auditor, a customer, or your own team after an incident. The same setup also lets you tighten over time — start by holding a broad set of actions, watch the queue, and relax the policies that prove safe.
how it works
- 01
List the actions that need a human
Pick the high-blast-radius ones: deletes, DROP TABLE / TRUNCATE, rm -rf, kubectl delete, mass updates without a WHERE, privilege escalation, secrets and money movement. Keep the list short so the queue stays meaningful.
- 02
Point the agent's traffic at agent·shield
Set the agent's outbound base URL to the proxy. No SDK, no code changes — from here on every HTTP request the agent makes passes through the checkpoint.
- 03
Write hold policies
Express each risky action as a policy — regex over method, path, and body. Matching requests are held; everything else is forwarded instantly so the agent keeps its speed.
- 04
Review the approval queue
Held requests appear with full context — method, path, body, matched policy. A person approves (it's then forwarded) or denies (it never runs). The action stays paused until someone decides.
- 05
Check the audit log and tune
Every decision is recorded with actor and timestamp. Use the log to relax policies that always get approved and tighten ones that catch real mistakes, so the gate stays sharp.
frequently asked
- Doesn't human approval defeat the point of an autonomous agent?
- Only if you gate everything. agent·shield holds just the destructive actions and forwards the rest instantly, so the agent stays autonomous for reads and routine work — a person only steps in for the handful of calls that could actually cause damage.
- How do I add approval without changing my agent's code?
- Route the agent's outbound HTTP through agent·shield and write hold policies. Because it's a transparent proxy, the agent calls the proxy's URL exactly as it would call the real service — there's no SDK to install and no functions to wrap.
- What happens to a request while it's waiting for approval?
- It's held — paused at the proxy and not forwarded. The destructive action hasn't happened. When a person approves, agent·shield forwards it to the real target; if they deny, it never reaches the target at all.
- Can I prove afterward that a human actually approved something?
- Yes. Every held request and its decision — approved or denied, by whom, when, against which policy — is written to an append-only audit log, so human-in-the-loop is a record you can produce, not just a policy you assert.
Published May 18, 2026 · Last updated June 13, 2026