use case

The cost of manual E2E testing, and where an AI agent actually helps

the short answer

Manual end-to-end testing is expensive in ways that don't show up on a timesheet: it's slow and unrepeatable, it gets skipped first under deadline pressure (which is exactly when regressions slip through), and the coverage lives in one tester's head rather than anywhere durable — and while static Selenium/Playwright automation removes the repetition, it adds a maintenance burden of its own, whereas an AI testing agent that runs plain-English scenarios gives you repeatable, low-maintenance browser checks anyone on the team can write.

Every team that ships a web app does manual E2E testing, whether they call it that or not: someone clicks through signup before a release, someone checks the checkout still works after a deploy, someone eyeballs the contact form. It feels free because nobody bills for it, but it's one of the most expensive habits in software — not in money, in the things it quietly costs you. It's slow, it's never quite the same twice, and it's the first thing dropped when a deadline tightens, which is precisely when you most need it.

The costs that never hit the timesheet

Manual testing's first cost is time, but its more corrosive costs are subtler. It's unrepeatable: two people clicking through the same flow take slightly different paths, so what got checked is never exactly knowable, and a regression can hide in the gap between how the tester clicked and how a real user would. It's unscalable: every flow you add is more clicking, forever, so coverage is capped by patience rather than by what matters. And it's fragile knowledge — the map of what to test lives in one person's head, and walks out the door when they're on leave or move on.

Worst of all, manual testing is the first casualty of pressure. The release that most needs a careful pass — the big one, shipped against a deadline — is the release where "we'll just do a quick smoke test" wins, because the thorough version takes too long. So the testing effort is lowest exactly when the risk is highest, which is how a manual-testing culture and a steady trickle of escaped regressions end up being the same thing described two ways.

Why static automation only half-fixes it

The textbook answer is to automate, and for the repetition problem it works: a Selenium or Playwright suite runs the same flow the same way every time, as often as you like, without a human clicking. That genuinely solves unrepeatability and the clicking tax. But it trades them for a new cost — maintenance — because those scripts are pinned to selectors, and selectors break every time the UI shifts. Many teams discover the automation they built to save time now needs constant tending, and a suite nobody trusts is barely better than no suite at all.

There's also an authoring wall. Static automation needs someone who can write the framework, which means the people who best understand a flow — the PM who designed it, the designer who built it, the support lead who knows where users get stuck — usually can't write the test for it. So coverage clusters around what's easy to script rather than what's important to check, and the long tail of flows stays manual or stays untested.

Where an AI agent changes the maths

An agentic tool attacks both halves at once. Because you author in plain English, anyone who can describe a flow can write a test — the PM, the designer, the support lead, not just whoever knows Playwright. And because the agent resolves intent against the live page each run rather than following a frozen selector, the maintenance bill that sinks static suites mostly goes away: a class rename doesn't break a test that was never tied to the class. assertly runs each scenario in a sandboxed headless browser and hands back pass/fail, an execution log, and a screenshot on failure, with a short run history per saved test so re-checking a flow is a click.

It's a tool, not a cure, and the boundaries are worth repeating: web-only for now, no CI/CD gate yet, and critical flows still want a human reviewing the generated steps. But for the specific pain this page is about — testing that's slow, unrepeatable, and quietly skipped under deadline — repeatable plain-English checks that survive UI churn are a real answer. And if your interest in testing flows from caring about quality generally, assertly sits in a family with clean·vibes (which scores code cleanliness) and quality·vibes (which runs AI code review), each chipping at a different slice of the same problem.

frequently asked

Isn't manual testing fine for a small app?

It's fine right up until it isn't. For a tiny app with one or two flows, clicking through before a release is reasonable. The cost compounds as flows multiply and releases speed up — that's when 'I'll just check it manually' starts losing to the deadline, and the regressions that slip through cost more than the testing would have. Cheap, repeatable checks let you keep coverage without keeping the clicking.

We already automated with Playwright — do we still have this problem?

You've solved repetition and likely picked up a maintenance bill instead. If your team spends meaningful time re-pointing selectors after UI changes, or trusts the suite less than it used to, that's the half static automation doesn't fix. An agentic layer over the churny, user-facing flows reduces that upkeep; keep deterministic scripts where bit-exact repeatability matters.

How does an AI agent make testing repeatable without the maintenance?

It separates what you test from how it's located. The scenario stores intent in plain English; the agent works out the locators fresh each run against the live page. So you get the repeatability of automation — same flow, on demand — without the selector upkeep that makes static suites brittle, because there are no hard-coded selectors to break.

Can non-engineers actually write these tests?

That's a core reason it exists. If you can describe a flow to a colleague — go here, click this, check that — you can write the test, no framework required. It puts test authoring in reach of PMs, designers, and support, which is usually where the deepest knowledge of the important flows lives.

Last updated June 20, 2026

ready to try assertly?

write a test