A Calm Approach to Tech Debt

I have sat in enough postmortems where someone points at a three-year-old function and says “nobody knows why this works.” And the room just goes quiet. Not because the code is complex, it’s usually not. But because the person who made that call left two teams ago, and whatever they were thinking is just gone. That’s the part that gets me. Not the mess itself, but that we keep losing the reasoning.

We call it tech debt. And the word “debt” already carries a verdict. Like we failed somewhere. Like the mess is evidence of carelessness.

Ward Cunningham had a better read on it:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt.

That framing already removes the guilt. Debt isn’t immoral. Taking a loan to buy a house isn’t a failure. Not paying it back is.

The bet was usually reasonable

Most “debt” started as a trade-off someone made deliberately. Skip TypeScript to ship faster and learn from real users first. Inline the function because the right abstraction wasn’t obvious yet. Keep the dependency because swapping it would’ve blocked the release by two weeks.

These weren’t mistakes. They were bets made under real constraints with real reasons.

The problem is nobody recorded the terms. No expiry date. No note saying “we chose this because X, revisit when Y.” So the workaround hardens. Becomes folklore. Now everyone calls that function and nobody touches it because touching it feels like defusing something without the manual.

We don’t lose the code. We lose the context around it.

Writing it down is more powerful than it sounds

I know “document your decisions” sounds like the most boring advice in engineering. But there’s a difference between writing documentation and writing a decision record.

Architecture Decision Records are one way. But it doesn’t have to be that formal. Even a comment that says “we did this because we needed to ship before the migration, revisit in Q2” is enough. The point is passing the reasoning forward. When team members change, when the codebase evolves, when someone new joins and asks “why is this like this?” there should be an answer they can actually find.

Without it, every refactor becomes archaeology. You’re not cleaning up code, you’re trying to figure out if it’s even safe to clean up.

If debt really slows you down, you should be able to show it

This is where I think a lot of teams get stuck. They know things feel slow. But they can’t point at why, so they can’t fix it, and they can’t get anyone to prioritise it.

Pick a few signals. Cycle times. Deploy frequency. Which modules keep showing up in incident postmortems. Watch them over a few sprints. Not to build a dashboard nobody looks at, but to find the actual friction. Things like build and deploy times, or which files keep showing up in incident recovery. Once you can see it, the conversation changes. You’re not arguing about code quality in the abstract anymore. You’re saying “this module adds 40 minutes to every release cycle” and that’s something a product manager can actually reason about.

Guilt doesn’t schedule work. Outcomes do.

Cleanup doesn’t need its own sprint

Teams either avoid refactoring entirely or treat it like a sacred event that needs full ceremony. Both feel off to me.

If a fix fits naturally in a sprint, write a story and ship it. If it touches multiple modules and needs careful sequencing, treat it as an epic with a clear thread. If it unblocks something strategic or crosses teams, that’s an initiative worth naming. The label isn’t the point. The clarity is. Connect the work to a reason someone else can follow, and it stops feeling like housekeeping and starts feeling like progress.

Rewrites are just one tool

And honestly, they have more failure modes than people admit going in.

A safer pattern is incremental: add tests around critical paths, find seams you can extract using the Strangler Fig pattern, delete dead code in small passes, isolate the hotspots. Changes like that compound. And when a bigger move is genuinely necessary, design it like an experiment. Feature flags. Staged rollout. A clear way to roll back if something breaks. The goal is to make the move boring and safe, not heroic and stressful.

Legacy code isn’t the problem. Losing the context around it is. If you keep the reasoning visible and the changes small, the system can evolve without anyone needing to be a hero about it.