A stranger to your own codebase

engineering

collaboration

Agents hand you the code without the understanding that used to come with writing it, so you re-onboard onto your own project every iteration. In modelling you can ship what you can’t explain — but code isn’t a model, and someone has to keep understanding it.

Author

Matthew Gibbons

Published

5 June 2026

You describe the change, the agent makes it, the tests go green, and you merge. It’s a good loop, and a fast one. Then a week later you open the same file to extend it, and you find yourself reading it the way you’d read a stranger’s code — carefully, from the top, working out what it does and why it’s shaped like this. Which is fair enough, because a stranger did write it. You’re not really maintaining your project any more; you’re re-onboarding onto it, again, the way you would onto something you’d never seen.

The agent took the typing, not the understanding

The typing was never the expensive part. What went with it — quietly, as a side effect nobody itemised — was the understanding. When you build a system by hand, you don’t just produce it; you accumulate a model of it as you go: why this function is split the way it is, which awkward case that branch is guarding, what you tried first and threw away. That mental model is the residue of the effort. Take the effort away and the artefact still arrives. The model in your head doesn’t.

Vicki Boykis described the feeling precisely: you come away from an agentic session with all the outward signs of having written code and none of the internal ones. Her line for the remedy has stuck with me — we should be more tired than the model. The tiredness was never the cost of the work. It was the work: the trace left behind by having understood something. Outsource the tiredness and you outsource the understanding along with it.

Fine for a model, not for code

I recognise this trade, because I had to talk myself into it. I’m a software engineer by instinct, and I crossed into data science from that side — teaching data scientists how to engineer software, and picking up their discipline as I went. The instinct that felt most like negligence at first was precisely this one: in modelling you ship things you cannot fully explain all the time, and it isn’t carelessness, it’s the job. A model earns its place by performing, not by narrating its reasoning in terms a person can follow. Nobody opens it up six months later to reason through why it leaned on that feature the way it did; the warrant is the metric, and the explanation is, honestly, optional. It took me a while to accept that trust the output, skip the understanding is a perfectly well-calibrated instinct — for a model.

It is badly miscalibrated for code, and agents are importing it wholesale. The difference is who reads the reasoning afterwards. A model’s internal logic is consulted by no human, ever. A codebase’s reasoning is consulted constantly — every change, every incident at three in the morning, every new hire trying to find their feet. That is the entire reason engineering grew its apparently fussy habits: readable code, code review, a named owner who can reconstruct the why. They look like hygiene — the professional throat-clearing you do because you’re supposed to. They are nothing of the sort. They are the machinery by which understanding survives contact with time and staff turnover, and they work by keeping a human in the loop who can still answer for the system.

The debt that doesn’t show up

This is where it bites at the level above the keyboard. A team shipping with agents can move faster than it understands, and the debt it runs up isn’t the familiar kind. The code might be perfectly clean — agents write tidy code. What’s missing is comprehension: the system works, and nobody on the team could stand up and defend why it’s built the way it is. No dashboard shows that gap. Velocity looks healthy right up until the incident that needs someone who genuinely knows how the thing fits together — and the honest answer is that nobody does, because they all re-onboard from the source each time, exactly like you.

And the reassurance that the tests pass is worth precisely what those tests can discriminate, which — as a green tick can quietly prove nothing — may be less than it looks. A passing suite tells you the code does what the tests check. It says nothing about whether anyone understands what it does, or could change it safely tomorrow.

Put the friction back on purpose

None of this is an argument for switching the agents off and typing everything by hand as penance. It’s an argument for putting the friction back where it actually buys you something. Boykis’s own list is a good place to start: write the first cut yourself and let the agent review it; make the agent explain the parts you don’t follow, with the documentation and the prior changes to back it up; have it propose two approaches and argue against the one it didn’t choose; sit with a problem for twenty minutes before you reach for the tool at all.

The common thread isn’t effort for its own sake. It’s staying able to reconstruct the why — so that the owner of record is also someone who understands the thing they own. Friction in the right place isn’t waste; it’s the price of keeping the model in your head and the model in the repository pointing at each other.

More tired than the machine

So when the loop closes and the bar goes green, it’s worth asking who did the understanding this time — you, or the model. If it was the model, you’ve shipped something you don’t yet own, however much it looks like you wrote it. The work, in the end, is to come away from it a little more tired than the machine.

Part of an occasional series on the traffic between software engineering and data science, in both directions. The ideas here are developed properly in Building with Certainty and Thinking in Uncertainty.