Code duplication: how much is too much?

The line: copies that must change together

Harmful duplication is coupled duplication. The same validation logic in three endpoints, the same fetch-and-error-handle block in four components, the same price calculation in the cart and the checkout — these copies encode one decision in several places, so when the decision changes, someone has to find every copy. The first time they miss one, you have a bug that exists in some code paths and not others, which is the most confusing kind to chase.

Harmless duplication is incidental: import blocks, config boilerplate, test arrange-act-assert scaffolding, two functions that happen to look similar today but serve different masters and will drift apart for good reasons. Deduplicating those buys you nothing and costs you an abstraction. The test is never "do these lines match?" — it's "is this one piece of knowledge written twice?"

How detection actually works

You can't grep for duplication, because the copies are never quite identical — a renamed variable, different whitespace, a reordered argument. Real detectors normalise first and compare windows of code rather than whole files. clean·vibes's approach: strip and normalise each line, slide an 8-line window through every code file, hash each window, and look for the same hashes appearing in more than one place.

File pairs sharing several windows get flagged as cross-file duplication — the report names both files and estimates the duplicated line count — and heavy repetition inside a single file gets flagged separately, because a block repeated five times in one file is a loop or a helper that never got written. Both severities feed the duplication subscore, which carries weight 15 in the overall cleanliness score.

Fixing it without over-abstracting

The fix for coupled duplication is boring and correct: extract the shared block into one function or module and import it from every former copy. Resist the urge to build a configurable mega-helper that handles all the copies' slight differences with flags — if the copies genuinely differ, extract only the truly shared core and let the call sites keep their differences visibly.

This is mechanical work that coding agents do well with precise instructions, which is why every clean·vibes duplication finding ships a ready-to-paste Claude prompt naming both files and the extraction to perform, with behaviour-preserving constraints. Worth saying plainly: window-based detection is a heuristic — it finds textual near-copies, not every conceptual repeat — so treat the report as the high-confidence list, not the complete one.

Duplication that matters vs duplication that doesn't

Kind	Example	Verdict
Coupled business logic	Same price calculation in cart and checkout	Fix now — copies must change together
Repeated handling blocks	Same fetch-and-error block in four components	Extract a shared helper
In-file repetition	Same block five times in one file	Fold into a loop or function
Test scaffolding	Similar setup across test files	Mostly fine — clarity beats DRY in tests
Boilerplate	Imports, config blocks, type declarations	Leave it — deduplicating buys nothing
Coincidental similarity	Two look-alike functions serving different features	Leave it — they'll drift apart for good reasons

frequently asked

Is there an acceptable percentage of duplication?

Percentages are the wrong lens — 5% duplicated boilerplate is fine and 2% duplicated business logic is a problem. Ask whether the copies encode one decision in several places. That's the duplication that bills you.

Why do AI coding tools duplicate so much?

Because generating a fresh copy is the path of least resistance: the tool doesn't reliably know a helper already exists elsewhere in your codebase, and writing new code always works. Unless you explicitly point at the existing function, you often get a second one.

How does clean·vibes count duplicated lines?

It slides an 8-line normalized window through every code file, hashes the windows, and counts windows shared between file pairs. Pairs sharing several windows are reported with an estimated duplicated-line count and both file names — enough to go straight to the extraction.

Won't extracting everything make my code harder to read?

Over-extraction is a real failure mode, which is why the right unit is the genuinely shared core, not everything that looks similar. The fix prompts clean·vibes writes are scoped to the flagged blocks — one extraction per finding, smallest reasonable diff, behaviour unchanged.

Published June 10, 2026 · Last updated June 11, 2026

How much code duplication is too much — and which copies actually matter

The line: copies that must change together

How detection actually works

Fixing it without over-abstracting

Duplication that matters vs duplication that doesn't

frequently asked

more on clean·vibes

related across the studio

ready to try clean·vibes?