Harnesses

Mar 21, 2026

I’ve had this domain for over 12 years and have never felt an urge to write, but suddenly now, after such a long time, I felt like I had something to say (humbly). After 2–3 years of witnessing and experiencing the evolution of AI coding models, and 10-plus years of being in the industry of being a professional code reader/writer, it’s becoming clear to me that’s what it’s all going to come down to. Harnesses.

Velocity isn’t the problem anymore, it’s the responsibility and integrity of the code that is being delivered, and in order to guarantee the highest likelihood of high integrity code, the developer needs to create a harness around his entire workflow, whether it’s recursive code quality loops, pre-commit hooks that check code coverage, or AI PR reviews (which will never, in my opinion, replace experienced human eyes, the last remaining bottleneck I foresee).

It’s been quite a ride to build a startup during the evolution of coding models. I have a particular trauma with my ex-tormentor — Claude 4.5 Sonnet (i.e. the weasel) and the verbally abusive shouting matches we had. And one moment I remember in particular, is when a colleague turned to me (this is regarding an 8GB codebase that I hand-wrote and co-AI wrote over the past 2–3 years) and said that I need to stop refactoring the project and just re-write it from scratch.

What? What do you mean re-write it from scratch? Wait… we need to re-write it from scratch.

I’m genuinely curious as to how many projects at this stage of the game found themselves in a similar situation, wracked by technical debt from poor coding models since GPT-4, and now blessed by Opus 4.5, have an opportunity to literally wipe the slate clean and re-write from scratch — within 1/100th of the time. 100 percent test coverage. Best quality. And we did that. And it was beautifully easy.

The best engineers and I think the best companies should be now devoting themselves to creating the highest quality engineering experience to not only maximize velocity, but most importantly, to maximize the integrity of the code being delivered. Code quality gates, numerous model reviewers, loops upon loops of self-improving feedback. At the end of the day, I still am astonished that just being a good reader, after years of struggling to put a semicolon in the right place or find out where a semicolon was missing in a file, was simply the most important thing a good engineer could be.

Update (Mar 24): Three days after I published this, Anthropic’s engineering team posted Harness design for long-running application development — a deep dive into multi-agent harness architectures, generator-evaluator loops, and context management for building full applications autonomously. It’s validating to see the same core idea — that harnesses, not raw model capability, are the bottleneck — coming from the team building the models themselves.