comparison

Agentic testing vs Selenium and static scripts: where each one wins

the verdict

Agentic testing and Selenium-style scripting differ in what the test stores: Selenium (and hand-written Playwright) pins each step to a specific selector, so the test breaks when the markup changes even if behaviour didn't, whereas an agentic tool like assertly stores the intent in plain English and re-resolves it against the live page every run, so cosmetic UI changes don't break it — the trade is that the agent is non-deterministic and needs human review on critical flows, while a static script is exactly repeatable but high-maintenance.

write a test

Selenium and hand-written Playwright are the workhorses of E2E testing for good reason: they're precise, deterministic, and run the same way every time. They're also the source of the maintenance grind every QA team knows — because the precision is pinned to selectors, and selectors change far more often than behaviour does. Agentic testing makes a different bet: store what the user is trying to do, and let an AI agent work out how to do it against the page as it exists right now.

What the test stores: selectors vs intent

A Selenium or Playwright script stores the mechanics: find the element matching button.cta-primary, click it, wait for #welcome. That's brilliant when nothing moves, and it's the failure point when something does — a renamed class, a new wrapper div, a reordered DOM, and the locator misses. The behaviour the user cares about hasn't changed; the path to it has, and the script can't tell the difference, so it goes red.

assertly stores the intent: "click 'learn more', verify the url contains /about". Each run, the agent reads the live page and resolves that intent fresh, the way a person would re-find the button regardless of its CSS. So the class rename and the wrapper div don't break the test, because they never enter into it. The cost of that flexibility is determinism: the agent makes a judgement each run rather than replaying a fixed script, which is exactly why critical flows still want a human reviewing the generated steps.

The maintenance trade, honestly stated

The day-to-day difference is upkeep. A static suite's running cost is re-pointing selectors after every UI change — death by a thousand small fixes, and the reason healthy-looking suites quietly rot until nobody trusts the red. An agentic suite spends that budget differently: less time fixing selectors, more time reviewing that the agent is still checking the right thing, especially where a missed regression would hurt.

Be clear-eyed about what you give up. A deterministic script is exactly repeatable and integrates cleanly into a CI gate — and assertly is web-only today with no CI/CD integration yet, so it doesn't replace a pipeline-gated Playwright suite for a mature team. Where it shines is the long tail: the flows nobody automated because the script wasn't worth writing, and the regression checks a team re-runs by hand. For those, plain-English authoring plus resilience to UI churn is a straight upgrade over both manual testing and a brittle script.

Where each one is the right tool

Reach for a static framework when you need bit-exact determinism, deep API or non-browser assertions, or a hard CI gate that blocks merges — that's its home turf, and assertly isn't trying to take it. Reach for agentic testing when the flow is browser-based and user-facing, when the UI changes often enough that selector upkeep is real, or when the person who understands the flow isn't the person who'd write Playwright.

In practice many teams will want both: deterministic scripts guarding the few flows where exact repeatability is non-negotiable, and an agentic layer covering the broad, churny surface area that static tests can't keep up with. assertly aims squarely at that second layer — fast to author, resilient to redesigns, with pass/fail, logs, and a failure screenshot on every run, and a short history per saved test so re-running is a click.

Agentic testing (assertly) vs static scripts (Selenium / hand-written Playwright)

Selenium / static Playwrightassertly (agentic)
What a test storesSelectors + fixed stepsPlain-English intent, re-resolved each run
Who can author itSomeone who knows the frameworkAnyone who can describe the flow
UI change (class rename, new wrapper)Often breaks — selector missesUsually survives — intent re-resolved
DeterminismExactly repeatableAgent judges each run; needs review on critical flows
Main running costRe-pointing broken selectorsReviewing that the agent checks the right thing
CI/CD pipeline gateMature, integrates as a gateNot yet — run from the app (on the roadmap)
ScopeWeb, API, more with librariesWeb only for the MVP
Best forBit-exact, gated, non-browser assertionsChurny browser flows + the un-automated long tail

frequently asked

Does assertly replace my Selenium suite?

Not for everything. It replaces the brittle, high-maintenance browser tests that break on cosmetic UI changes, and it covers flows you never bothered to automate. It doesn't replace a deterministic, CI-gated suite for bit-exact or non-browser assertions — assertly is web-only with no CI/CD integration yet. Many teams run both layers.

If the agent is non-deterministic, can I trust a green result?

Trust it the way you trust any generated artifact: verify it once. Review the generated steps for your critical flows to confirm the agent is checking what you meant, and watch failure screenshots. The flexibility that survives a UI change is the same flexibility that could route around a real problem, so a human stays in the loop where a miss would hurt.

Is it faster to get started than writing Playwright?

Almost always, because you skip the framework. There's no project to scaffold, no selectors to dig out of the DOM, no test runner to wire up — you write the flow in English and run it. The time you'd have spent learning and maintaining a framework is the time agentic testing gives back.

What about flaky tests — does agentic testing help?

It changes the shape of flakiness rather than eliminating it. Selector-driven flakiness (timing, brittle locators) drops, because the agent waits for and re-finds elements by intent. In exchange you take on judgement-driven variance, which is why clear, single-assertion scenarios and a review pass on critical flows matter — see write-natural-language-test-scenarios.

Last updated June 20, 2026

ready to try assertly?

write a test