May 8, 2026·playwrightmaintenanceselectors

How to keep AI-authored Playwright tests from breaking when your UI changes

You ship a UI change. The next CI run goes red across a dozen specs. None of them found a bug; they found a renamed class. You spend the afternoon fixing tests instead of building.

Every end-to-end suite hits this. The question is what it costs you each time. Some tools keep a model in the loop that re-resolves selectors while the test runs, so you pay for inference on every CI run forever. The approach below keeps your tests as plain Playwright and concentrates that cost at the one moment something changed. Five rules.

1. Use semantic locators, never CSS or XPath

This one rule does more than the other four combined. A test written like this:

await page.locator('.btn-primary.css-1x9f2 > span').click();

dies the moment a designer renames a class. A test written like this:

await page.getByRole('button', { name: 'Sign in' }).click();

survives almost any restyle, because "the button labelled Sign in" is a promise you make to your users, not an implementation detail you might refactor tomorrow.

Hover writes getByRole, getByLabel, getByText, and getByTestId by default, and never CSS or XPath. Write your hand-authored tests the same way. That single habit decides whether your suite tolerates UI churn or fights it.

A button that still exists in the DOM but now hides behind a kebab menu is a quiet trap. Playwright's .click() auto-waits for actionability, so it does eventually fail, after a 30-second timeout, with a message that reads like flaky infrastructure.

Three lines turn that into a fast, legible failure:

const el = page.getByRole('button', { name: 'Submit' });
await expect(el).toBeVisible();
await el.click();

Now if the button drifts into a closed drawer, toBeVisible() fails in about five seconds with "Locator expected to be visible." An engineer reads that and knows it's a UI regression, not a network blip. Hover emits this prelude for you on click, dblclick, hover, fill, and selectOption.

3. Re-record on purpose when the semantics shift

Someone renames "Sign in" to "Log in." The selector breaks, and it should. The test was asserting a contract that changed. This is the suite working, not failing.

You have two ways to fix it. Edit the file by hand; it's plain Playwright, so it's a one-line change. Or click ⟳ Re-record in the widget: the agent reads the test's original prompt, runs it against the current app, and rewrites the file. About 30 seconds, about ten cents, and you read the git diff before you commit. You do this when you decide something changed, not as an automatic model call on every green run.

4. Re-record what's red, not everything

You will want a "re-record all" button. Skip it. Re-recording specs that passed burns tokens on tests that were fine and floods your diff with cosmetic selector churn across files nobody touched. Let CI tell you which specs are red. Re-record those, one at a time, reading each diff. It's slower by a few minutes and leaves a history you can bisect.

5. Let real flow breaks fail

Not every red test is a chore for you to clear. When a test fails because a step that used to work now doesn't, that test caught a regression. Fix the app. A suite that never goes red on a real break gives you nothing.

What you get

Your suite stays plain Playwright: deterministic, fast, free to run, readable in a pull request, runnable on any laptop with no model present. You pay for maintenance the moment a contract changes, in a diff you reviewed, not as a standing tax on every build that was already green.

See Hover's re-record workflow →

Try Hover on your own app.

One command adds the widget to your dev server. Author tests with AI, ship plain Playwright.

npx @hover-dev/cli setup