← All posts
·playwrightcodegencomparison

Playwright codegen vs. AI exploration: when each one wins

npx playwright codegen is one of the best free things in the testing ecosystem. It opens a browser, you click through your app, and it writes down every interaction as Playwright code with role- and text-based locators. No account, no model, no cost. If you have never run it, run it today.

Recording your clicks and authoring a test are different jobs. That difference is the whole story.

What codegen does

Codegen transcribes. It watches you and writes down what it saw. Click the "Sign in" button and it emits page.getByRole('button', { name: 'Sign in' }).click(). Fill a field and it emits a .fill(). It picks good locators, favouring roles and labels over CSS, and the output is clean Playwright you can read.

It decides nothing. Codegen has no idea what you're trying to accomplish. You can't tell it "test the login flow" and walk away; you perform the login flow yourself, click by click. It adds an assertion only when you click the toolbar to record one. It can't tell which of your clicks mattered and which were you hunting for a menu. The raw output is a draft you then edit.

That isn't a gap in the tool. A stenographer faithfully writes down what was said. Codegen faithfully writes down what you did.

Where AI exploration is doing something else

Hover starts from what you want, not from your clicks. You type a sentence, "log in, then add a todo named verify hover," and the agent works out the steps: it finds the email field, the password field, the submit button, types the todo, and adds an assertion that the todo shows up. You described the outcome. The agent found the path.

A few things change as a result. You author by saying what you want instead of performing it, which helps when the flow is long or when you're specifying a test for an app someone else built. Assertions come from the agent understanding the goal, so you don't have to remember to record the check that proves it worked. And the agent walks past small surprises, a cookie banner or an interstitial, that would interrupt a manual recording.

Hover's output is the same kind of file codegen produces: a plain @playwright/test spec, semantic locators, runs in CI with no model. Hover spends the model while authoring and hands you ordinary Playwright. You don't trade codegen's clean, dependency-free output for a vendor format.

Which one to reach for

Use codegen when the flow is short, you're happy to click it yourself, you want nothing beyond Playwright, and you're capturing a sequence you already know step by step.

Use AI exploration when you would rather describe the test than perform it, when the flow is long or branchy or wants assertions inferred, or when you want to re-author against a changed UI by re-running the prompt instead of re-clicking the whole thing.

The property worth protecting

Both tools hand you standard Playwright code that runs in CI with no model. Guard that. A test you can read, diff, hand to a teammate, and run for free on any machine outlasts a test trapped in a vendor runtime, however it was written. Codegen gets you there by transcription. Hover gets you there by exploration. Pick the one that fits the flow in front of you.

Read the Hover record-mode docs →

Try Hover on your own app.

One command adds the widget to your dev server. Author tests with AI, ship plain Playwright.

npx @hover-dev/cli setup