The line: copies that must change together
Harmful duplication is coupled duplication. The same validation logic in three endpoints, the same fetch-and-error-handle block in four components, the same price calculation in the cart and the checkout — these copies encode one decision in several places, so when the decision changes, someone has to find every copy. The first time they miss one, you have a bug that exists in some code paths and not others, which is the most confusing kind to chase.
Harmless duplication is incidental: import blocks, config boilerplate, test arrange-act-assert scaffolding, two functions that happen to look similar today but serve different masters and will drift apart for good reasons. Deduplicating those buys you nothing and costs you an abstraction. The test is never "do these lines match?" — it's "is this one piece of knowledge written twice?"
How detection actually works
You can't grep for duplication, because the copies are never quite identical — a renamed variable, different whitespace, a reordered argument. Real detectors normalise first and compare windows of code rather than whole files. cleanvibes's approach: strip and normalise each line, slide an 8-line window through every code file, hash each window, and look for the same hashes appearing in more than one place.
File pairs sharing several windows get flagged as cross-file duplication — the report names both files and estimates the duplicated line count — and heavy repetition inside a single file gets flagged separately, because a block repeated five times in one file is a loop or a helper that never got written. Both severities feed the duplication subscore, which carries weight 15 in the overall cleanliness score.
Fixing it without over-abstracting
The fix for coupled duplication is boring and correct: extract the shared block into one function or module and import it from every former copy. Resist the urge to build a configurable mega-helper that handles all the copies' slight differences with flags — if the copies genuinely differ, extract only the truly shared core and let the call sites keep their differences visibly.
This is mechanical work that coding agents do well with precise instructions, which is why every cleanvibes duplication finding ships a ready-to-paste Claude prompt naming both files and the extraction to perform, with behaviour-preserving constraints. Worth saying plainly: window-based detection is a heuristic — it finds textual near-copies, not every conceptual repeat — so treat the report as the high-confidence list, not the complete one.
Duplication that matters vs duplication that doesn't
| Kind | Example | Verdict |
|---|---|---|
| Coupled business logic | Same price calculation in cart and checkout | Fix now — copies must change together |
| Repeated handling blocks | Same fetch-and-error block in four components | Extract a shared helper |
| In-file repetition | Same block five times in one file | Fold into a loop or function |
| Test scaffolding | Similar setup across test files | Mostly fine — clarity beats DRY in tests |
| Boilerplate | Imports, config blocks, type declarations | Leave it — deduplicating buys nothing |
| Coincidental similarity | Two look-alike functions serving different features | Leave it — they'll drift apart for good reasons |
frequently asked
- Is there an acceptable percentage of duplication?
- Percentages are the wrong lens — 5% duplicated boilerplate is fine and 2% duplicated business logic is a problem. Ask whether the copies encode one decision in several places. That's the duplication that bills you.
- Why do AI coding tools duplicate so much?
- Because generating a fresh copy is the path of least resistance: the tool doesn't reliably know a helper already exists elsewhere in your codebase, and writing new code always works. Unless you explicitly point at the existing function, you often get a second one.
- How does cleanvibes count duplicated lines?
- It slides an 8-line normalized window through every code file, hashes the windows, and counts windows shared between file pairs. Pairs sharing several windows are reported with an estimated duplicated-line count and both file names — enough to go straight to the extraction.
- Won't extracting everything make my code harder to read?
- Over-extraction is a real failure mode, which is why the right unit is the genuinely shared core, not everything that looks similar. The fix prompts cleanvibes writes are scoped to the flagged blocks — one extraction per finding, smallest reasonable diff, behaviour unchanged.
Published June 10, 2026 · Last updated June 11, 2026