Why a capable agent is a dangerous one
An agent only deletes production data if it can reach production and issue a destructive command — and useful agents are given exactly that reach. The danger isn't malice; it's that the model is a probabilistic system acting on instructions it can misread. A poisoned web page can tell it to "clean up old records". A confusing ticket can lead it to a DELETE. A generated query can omit the WHERE clause and turn an update into a wipe. None of these require the agent to be broken — just to be wrong once, on the wrong call.
And destructive actions are uniquely unforgiving. A bad read leaks nothing; a bad routine write is usually recoverable; a DROP TABLE or an rm -rf against production is the kind of mistake you measure in restore time and lost data. The asymmetry is the whole problem: most of what an agent does is harmless, but the rare destructive call is catastrophic, so "it's been fine so far" is not a safety strategy.
Don't remove the access — gate the action
The wrong fix is to strip the agent's write access, because then it can't do its job — you're back to a glorified chatbot. The right fix is to keep the access but intercept the dangerous actions specifically. agent·shield sits as a transparent proxy between the agent and production: the agent still has its credentials and still issues commands, but every request passes through a checkpoint first.
Safe traffic — the reads and routine writes that make up almost everything the agent does — is forwarded instantly, so the agent stays fast and fully functional. The destructive calls are the ones that stop: a DELETE, a DROP TABLE, a TRUNCATE, a mass update with no WHERE, an rm -rf, a kubectl delete gets held in an approval queue. A person sees the exact request and approves or denies it before it reaches production. The agent keeps every capability; it just can't unilaterally pull the trigger on the irreversible ones.
Make the close call before, and the cleanup after, possible
Holding a destructive action turns the worst-case moment into a decision instead of an incident. The reviewer reads the actual command — this DELETE hits one row, fine; this query would truncate the whole table, deny — and the production system never sees the bad version. The agent's autonomy survives because the gate only fires on the handful of actions that genuinely warrant a human.
When something does go wrong — or nearly does — the append-only audit log is what makes it tractable. Every forwarded, blocked, held, approved, and denied request is recorded with the full request, the matched policy, the actor, and the timestamp. After a near-miss you can see exactly what the agent tried, why it was caught, and who decided. To be precise about scope: agent·shield gates the traffic you route through it, so point the agent's database, API, and infra calls at the proxy and that's where the protection applies.
Three ways to handle an agent with production access
| Trust the model | Strip its write access | agent·shield (gate the action) | |
|---|---|---|---|
| Agent stays useful | Yes | No — read-only, can't do its job | Yes — keeps full access |
| Destructive calls | Run on a bad inference | Impossible — but so is real work | Held for human approval |
| Speed on safe work | Full | Full | Full — safe traffic forwarded instantly |
| Bad-inference risk | Catastrophic | None | Caught at the gate |
| After an incident | Hope for backups | n/a | Full audit log of what was tried |
frequently asked
- Shouldn't I just give the agent read-only access to be safe?
- You can, but then it can't do the work that justified building it. agent·shield lets the agent keep write access and instead gates the specific destructive actions — so it stays useful and the irreversible calls still need a human.
- What destructive actions does it catch by default?
- The high-blast-radius patterns: HTTP DELETEs, DROP TABLE and TRUNCATE in SQL bodies, mass updates with no WHERE clause, rm -rf, and kubectl delete. Policies are regex over method, path, and body, so you can match exactly what's destructive in your stack.
- Will gating destructive actions slow the agent on normal work?
- No. Reads and routine writes — almost everything an agent does — are forwarded instantly. Only the destructive calls are held, so the agent runs at full speed except in the rare moments that warrant a human.
- What if a bad delete still gets through?
- The point of the gate is that destructive calls are held before they run, so the bad version doesn't reach production unless a human approves it. And because every decision is in the append-only audit log, you can always see exactly what was tried, what was caught, and who approved what.
Published April 28, 2026 · Last updated June 13, 2026