← All posts
·vibe-codingmaintenanceplaywrightregression

Your vibe-coded app keeps breaking and you can't tell why

You ask the AI to add a discount field to checkout. It does. You test checkout, it works, you ship. Two days later a user emails that the cart page shows the wrong total. You go look, and the cart total has been wrong since the discount change, because the AI also touched a shared currency helper that the cart imports. You never opened that helper. You never opened the cart that day either. The regression sat in production for two days because nothing told you it was there.

This is the part of vibe-coding nobody warns you about. Shipping a feature got cheap. Knowing the rest of the app still works did not.

Fast code is code you didn't read

When you write an app by hand, you carry a rough model of it in your head. You know which files touch the cart, you know the helper that formats money, you know what a change is likely to ripple into. That model is most of what lets you guess where to look when something breaks.

When an AI writes a lot of your app fast, you skip that. The code lands, it passes a quick look, you move on. After a few weeks you have screens you have never read line by line and dependencies you don't know exist. The model in your head is now smaller than the app. So when the next prompt changes something, you genuinely cannot predict what it touched, and you have no instinct for where the damage might be.

Why this hurts more with vibe-coding

A change three screens away is the normal failure mode, not the rare one. The AI doesn't share your sense of module boundaries. It will happily refactor a shared utility to make the current feature cleaner, and that utility might be load-bearing for a flow you forgot you built.

You have two options when you can't predict the blast radius. You re-test the whole app by hand on every change, which gets slower than writing the feature and which you will stop doing by Thursday. Or you ship and let users find the breakage. Most people pick the second one without deciding to, because the first one doesn't scale and there's no third option in front of them.

There is a third option. It's the same one human teams have used for decades, and you can have it without writing the tests yourself.

A regression net of readable specs

A regression test does one job: it exercises a real flow and fails loudly when the behavior changes. Run the suite after every prompt and the cart-total bug shows up in seconds instead of two days, with a stack trace pointing at the assertion that broke. You stop guessing where the damage is.

The catch has always been that someone has to write and maintain those tests, and on a vibe-coded app the person who should write them is the person who didn't read the code. That's where Hover comes in. It's a free, open-source VS Code extension. You drive your real app once, in your real browser, and Hover crystallizes that run into a plain @playwright/test spec with semantic getByRole and getByLabel selectors. No new test framework, no proprietary format. The file is standard Playwright you can read and edit.

It runs in CI with zero AI in the loop. The agent helped author the spec; it is not present when the spec runs. What executes on every push is deterministic Playwright code, the same thing you'd get from a human who wrote it by hand. The testing side of this is its own story, covered in part one.

Keeping the specs alive

A regression net only helps if it stays readable and stays current, and that's usually where homegrown suites rot. Hover keeps the specs maintainable on two fronts.

When a spec gets long, you can run an optional optimize pass. It proposes a cleaner version of the spec: page objects so selectors live in one place, named test.step stages so a failure tells you which part of the flow broke, assertions drawn from what actually happened during the run. You review it as a diff and accept it if you like it. The original deterministic spec is always kept, so the optimization never costs you a working test.

When the UI genuinely changed, you re-record the spec from the extension instead of hand-patching selectors one by one. You drive the new flow, Hover crystallizes the new version, and the spec reflects what the app does now. Deliberate re-recording beats chasing broken selectors through a file you'd rather not maintain. More tactics for keeping a suite from rotting are in this guide.

The pattern is the same one good engineering teams have always run. You change the app, the suite tells you what moved, you fix it or you bless the change. Vibe-coding didn't kill that loop. It just removed the part where you had to read every line to participate in it.

Next in the series: the bugs that don't break a flow but open a door. Part three covers the security holes vibe-coded apps ship with.

Try Hover on your own app.

Install the VS Code extension. Author tests with AI, ship plain Playwright.

Install on VS Code Marketplace →