#1–#2: commented-out code and deep nesting
Commented-out code topped the list at 47.5% of repos — nearly half the corpus ships blocks of switched-off code. The cause is the vibe-coding loop itself: try an approach, comment it out, try another, ship the one that worked, never circle back. The fix is one rule with no exceptions: delete it. Git history keeps every version forever; your files don't have to be their own archive.
Deep nesting came second at 46.5%: conditionals six or more levels deep, where a reader has to hold every level's condition in their head at once — which is precisely where logic bugs live. Generated code nests because each new requirement gets wrapped around the existing logic rather than restructured into it. The fix is mechanical: early returns and guard clauses for the failure paths, and extracting the inner levels into named functions.
#3–#4: large files and cross-file duplication
Files past the ~600-line threshold appeared in 46.5% of repos, and the affected repos averaged 3.4 size-flagged files each — and since per-rule reporting caps at 10, that average is a floor. AI tools grow files indefinitely because adding to the file they're already in always works; nobody prompts for a restructure. The fix: split along the natural seams, one module per responsibility, re-export from an index so existing imports keep working.
Cross-file duplication hit 45.5% of repos: the same normalized block of code living in two or more files. This is the most expensive item on the list, because coupled copies mean every bug fix has to land in every copy — and eventually one gets missed. It happens because generating a fresh copy is cheaper for the model than finding the helper that already exists. The fix: extract the genuinely shared core into one function or module and import it everywhere it was pasted.
#5–#6: in-file repetition and giant files — and the pattern behind all six
In-file repetition — the same block several times within one file — hit 38.4% of repos; it's usually a loop or a helper that never got written. Giant files past 1,200 lines appeared in 26.3% of repos: a quarter of the corpus has at least one file that's effectively unreviewable, where every change risks side effects nobody can see. Both fixes are the structural ones above, applied with more urgency.
The pattern behind all six is one sentence: AI tools optimise for the next working answer, not for the shape of the codebase. Every finding here is invisible to the does-it-run feedback loop, so it ships. All six come from this study: 99 public repos self-described as AI- or vibe-coded, scanned by the clean·vibes rules engine, data collected june 10, 2026 — the full numbers are in the state-of report, and every finding type here is what a clean·vibes scan flags in your own repo, with file, line, and a paste-ready Claude fix prompt.
The top 6 kinds of mess across 99 vibe-coded repos (clean·vibes rules engine, june 2026)
| Finding | % of repos | The fix |
|---|---|---|
| Commented-out code | 47.5% | Delete it — git history already keeps every version |
| Deep nesting (6+ levels) | 46.5% | Early returns, guard clauses, extract the inner levels |
| Large file (past ~600 lines) | 46.5% | Split along natural seams, one module per responsibility |
| Cross-file duplication | 45.5% | Extract the shared core into one module; import it everywhere |
| In-file repetition | 38.4% | Fold the repeats into a loop or a named helper |
| Giant file (past 1,200 lines) | 26.3% | Same split, more urgently — these files are where bugs hide |
frequently asked
- Why are the top four so close together?
- Because they share one cause: the most direct path to working code. Comment out instead of delete, nest instead of restructure, grow the file instead of splitting, paste instead of importing. A repo built that way collects all four at once — which is exactly what the flat top of the ranking shows.
- Which of these should I fix first?
- Structure first: split the large and giant files, because everything else gets easier in smaller files. Then duplication (it multiplies future bug fixes), then nesting, then the dead code sweep. That's the same order clean·vibes's severity ranking produces.
- Where do these percentages come from?
- From our corpus study: the clean·vibes rules engine run over 99 public GitHub repos that describe themselves as AI- or vibe-coded, collected june 10, 2026, AI pass off. Each percentage is the share of repos with at least one instance of that finding. Full method is on the methodology page.
- How do I check my repo for all six at once?
- Paste the repo link into clean·vibes. The same rules that produced these numbers run against your code — plus a Claude review the study deliberately switched off — and every finding comes back with file, line, and a paste-ready fix prompt.
Last updated June 11, 2026