The most common mess in AI-generated code, ranked

#1–#2: commented-out code and deep nesting

Commented-out code topped the list at 59.2% of repos — nearly half the corpus ships blocks of switched-off code. The cause is the vibe-coding loop itself: try an approach, comment it out, try another, ship the one that worked, never circle back. The fix is one rule with no exceptions: delete it. Git history keeps every version forever; your files don't have to be their own archive.

Deep nesting came second at 53.6%: conditionals six or more levels deep, where a reader has to hold every level's condition in their head at once — which is precisely where logic bugs live. Generated code nests because each new requirement gets wrapped around the existing logic rather than restructured into it. The fix is mechanical: early returns and guard clauses for the failure paths, and extracting the inner levels into named functions.

#3–#4: large files and cross-file duplication

Files past the ~600-line threshold appeared in 56.1% of repos, and the affected repos averaged 5.3 size-flagged files each — and since per-rule reporting caps at 10, that average is a floor. AI tools grow files indefinitely because adding to the file they're already in always works; nobody prompts for a restructure. The fix: split along the natural seams, one module per responsibility, re-export from an index so existing imports keep working.

Cross-file duplication hit 46.4% of repos: the same normalized block of code living in two or more files. This is the most expensive item on the list, because coupled copies mean every bug fix has to land in every copy — and eventually one gets missed. It happens because generating a fresh copy is cheaper for the model than finding the helper that already exists. The fix: extract the genuinely shared core into one function or module and import it everywhere it was pasted.

#5–#6: in-file repetition and giant files — and the pattern behind all six

In-file repetition — the same block several times within one file — hit 41.2% of repos; it's usually a loop or a helper that never got written. Giant files past 1,200 lines appeared in 33.0% of repos: a quarter of the corpus has at least one file that's effectively unreviewable, where every change risks side effects nobody can see. Both fixes are the structural ones above, applied with more urgency.

The pattern behind all six is one sentence: AI tools optimise for the next working answer, not for the shape of the codebase. Every finding here is invisible to the does-it-run feedback loop, so it ships. All six come from this study: 549 public repos self-described as AI- or vibe-coded, scanned by the clean·vibes rules engine, data collected July 2026 — the full numbers are in the state-of report, and every finding type here is what a clean·vibes scan flags in your own repo, with file, line, and a paste-ready Claude fix prompt.

The top 6 kinds of mess across 549 vibe-coded repos (clean·vibes rules engine, July 2026)

Finding	% of repos	The fix
Commented-out code	59.2%	Delete it — git history already keeps every version
Deep nesting (6+ levels)	53.6%	Early returns, guard clauses, extract the inner levels
Large file (past ~600 lines)	56.1%	Split along natural seams, one module per responsibility
Cross-file duplication	46.4%	Extract the shared core into one module; import it everywhere
In-file repetition	41.2%	Fold the repeats into a loop or a named helper
Giant file (past 1,200 lines)	33.0%	Same split, more urgently — these files are where bugs hide

frequently asked

Why are the top four so close together?

Because they share one cause: the most direct path to working code. Comment out instead of delete, nest instead of restructure, grow the file instead of splitting, paste instead of importing. A repo built that way collects all four at once — which is exactly what the flat top of the ranking shows.

Which of these should I fix first?

Structure first: split the large and giant files, because everything else gets easier in smaller files. Then duplication (it multiplies future bug fixes), then nesting, then the dead code sweep. That's the same order clean·vibes's severity ranking produces.

Where do these percentages come from?

From our corpus study: the clean·vibes rules engine run over 549 public GitHub repos that describe themselves as AI- or vibe-coded, collected July 2026, AI pass off. Each percentage is the share of repos with at least one instance of that finding. Full method is on the methodology page.

How do I check my repo for all six at once?

Paste the repo link into clean·vibes. The same rules that produced these numbers run against your code — plus a Claude review the study deliberately switched off — and every finding comes back with file, line, and a paste-ready fix prompt.

Last updated June 11, 2026

The most common mess in AI-generated code — measured across 549 repos, not guessed

#1–#2: commented-out code and deep nesting

#3–#4: large files and cross-file duplication

#5–#6: in-file repetition and giant files — and the pattern behind all six

The top 6 kinds of mess across 549 vibe-coded repos (clean·vibes rules engine, July 2026)

frequently asked

more on clean·vibes

related across the studio

ready to try clean·vibes?