how to

Refactoring a messy repo with Claude Code: prompts that tidy without breaking

the short answer

To refactor a messy repo with Claude Code, give it one prompt per issue stating the problem, the exact file and line, the precise transformation (split this file along its seams, extract this duplicated block, delete this dead code), and the constraints that keep it safe — 'behaviour must not change' and 'smallest possible diff' — then review each diff before committing and re-scan to verify; cleanvibes generates exactly this prompt for every finding it reports, plus one tidy-everything plan covering the whole report in severity order.

If an AI tool built your app, an AI tool can clean it up — splitting files, extracting duplicated blocks, deleting dead code, and flattening nesting are mechanical transformations that coding agents handle well. The catch is that the quality of the refactor tracks the quality of the prompt, and "clean up my code" is among the worst prompts you can write: vague enough to invite a rewrite, broad enough that you can't review what comes back.

This guide covers the anatomy of a refactor prompt that works, why behaviour-preservation is the constraint that matters most, and the review-and-verify discipline that makes agent refactoring safe even in a repo with no tests. It's also exactly the format cleanvibes generates automatically — one prompt per finding, ready to paste.

one prompt per findingcleanvibes writes the refactor prompt for you — issue, location, why, exact change, constraints

The anatomy of a refactor prompt that works

A good refactor prompt has five parts. The problem, named precisely: "src/App.tsx is 1,847 lines and holds routing, data fetching, and UI state at once" — not "this file is messy". The location: file and line, so the agent edits the right code instead of searching and improvising. Why it matters: one sentence of consequence, which anchors the agent on intent. The exact transformation: "split along its natural seams, one module per responsibility, re-export from an index so existing imports keep working". And the constraints — the part that separates a refactor from a rewrite.

Specificity is the whole game. Given an exact location and an exact transformation, a coding agent is excellent at this work; given a vague goal, it improvises — and improvisation during refactoring is how you get renamed functions, "improved" logic, and a diff nobody can review.

Behaviour-preservation: the constraint that does the work

"Behaviour must not change" is the line between refactoring and rewriting, and it needs to be explicit in every prompt — agents default to helpfulness, and helpfulness during a tidy-up looks like fixing a perceived bug, renaming a confusing variable, or upgrading a pattern. Each of those might be welcome someday; during a refactor they're contamination, because they make the diff unverifiable as a pure restructuring.

Its partner is "smallest possible diff", which caps the blast radius and keeps review honest: a diff that only moves code is easy to verify, a diff that moves and edits is not. For repeated mess — duplication especially — add "check for the same pattern elsewhere", so one finding becomes a sweep of every instance. All three constraints are baked into every prompt cleanvibes generates, which is the difference between handing the agent a finding and handing it an invitation.

Review, verify, re-scan

Agent-applied refactors get reviewed before they get committed. Read the diff and check three things: the flagged mess is actually gone (the file is split, the duplicate is extracted, the dead block is deleted), nothing unrelated changed, and the moves are pure — code relocated, not rewritten. Then run the app and whatever checks exist. In a repo with no tests, do the refactors one finding at a time with a manual smoke test between each; it's slower and dramatically safer than one giant cleanup commit.

The loop closes with a re-scan: paste the repo into cleanvibes again and watch the findings disappear and the score climb — every report shows the delta since last scan, which turns cleanup into visible progress. For a long findings list, the tidy-everything plan covers all findings in severity order in a single paste; review that diff with proportionally more care, since the change set is larger.

how it works

  1. 01

    Start from a concrete finding

    Run a scan so each issue comes with a file, line, and severity. A cleanvibes finding already includes the full refactor prompt — for manual prompts, gather the same details first.

  2. 02

    Name the problem and the location

    State the specific mess and point at the exact file and line, plus one sentence on why it costs you. The agent should never have to search for what you mean.

  3. 03

    Specify the exact transformation

    Split along seams, extract the shared block into one module, delete the commented-out code, flatten with guard clauses — say what the code should become, not just that it should be 'cleaner'.

  4. 04

    Add the constraints

    'Behaviour must not change', 'smallest possible diff', and — for duplication — 'check for the same pattern elsewhere'. These three keep a refactor from quietly becoming a rewrite.

  5. 05

    Paste into Claude Code and review the diff

    Confirm the mess is gone, nothing unrelated moved, and the changes are pure restructuring. Run the app or its checks before committing.

  6. 06

    Go one finding at a time without tests

    In a repo with no test suite, sequence the refactors and smoke-test between each. One giant cleanup commit in an unverified codebase is how tidy-ups break production.

  7. 07

    Re-scan to verify

    Run the scan again and confirm the findings are gone and the score moved — cleanvibes shows the delta since last scan. The free monthly credits cover a tidy-and-verify loop.

frequently asked

Why not just ask Claude to 'find and clean up all the mess'?
Discovery and fixing are different jobs. A freeform ask produces inconsistent discovery and sprawling changes. A scan gives you a stable, ranked list; targeted prompts give you small, reviewable diffs. cleanvibes splits it exactly that way: it finds, your agent fixes.
What's in the prompts cleanvibes generates?
Each one states the issue, the file and line, why it matters, the exact transformation, and the behaviour-preserving constraints. Every plan includes every prompt and the tidy-everything plan; Pro scans get their analysis from Claude Opus instead of Haiku, with more than double the code context.
Is it safe to let an agent refactor a repo with no tests?
Safe-ish, with discipline: one finding at a time, behaviour-preserving constraints in every prompt, a diff review and a smoke test between each. And consider letting the first 'refactor' be adding a few tests around the core logic — it makes every subsequent fix cheaper to verify.
What's the tidy-everything plan and when should I use it?
A single prompt covering all findings in severity order, so one paste works through the whole report. It suits a dedicated cleanup session on a repo with many findings — and warrants a more careful diff review afterwards, because the change set is bigger.

Last updated June 10, 2026

ready to try cleanvibes?

score your repo