Code review can't be fixed with more code review

2026-03-04

post-thumb

Nobody has ever enjoyed doing code review. The more files in a PR, the fewer people actually reviewed it. That was already the case in 2023. Now, in 2026, AI agents generate most of the code and the review problem has gotten exponentially worse.


More AI means more to review

Many companies believe you can fight fire with more fire. The logic goes: if AI generates too much code to review, put another AI to review it. That doesn’t make sense. More AI generating code means more things for someone (or something) to validate. The volume doesn’t decrease, it just changes hands.

Back in 2014, Facebook learned that “Move Fast and Break Things” doesn’t scale and changed their motto to “Move Fast with Stable Infra.” The same idea that industrial logic and Modern Agile had already advocated. Today, you hear this under the name “guardrails.” The concept is the same: speed without structure is just accelerated chaos.


The problem is that review is manual

AI agents have already taken the fun part of programming away from us. Don’t let them also turn us into glorified manual QA. What we need is to replace repetitive human review with automation through code.

Code is deterministic. Code is reliable. And the main point: the same things get corrected in code review, over and over, PR after PR. If you’re fixing the same type of issue for the third time, it shouldn’t be a comment on the PR. It should be a lint rule, a test, an automated check.


More tests (for real) than ever

When humans wrote all the code, we were already afraid of breaking things. AI agents will break everything, all the time, forever. And now we need to test things we historically neglected because it was too expensive or because there were always bigger problems to solve:

  • Robust and reliable test suites
  • CSS and layout tests
  • Observability tests
  • Infrastructure tests
  • Configuration tests
  • DevOps component tests: Terraform, Kubernetes, ArgoCD and others

But having tests isn’t enough. We need the critical thinking to tell good tests from bad ones. A test that always passes, no matter what changes, is worse than having no test at all, because it gives false confidence. We need to understand what makes a test useful so we can judge whether we’re actually improving our coverage or just inflating numbers.

The age of AI agents doesn’t call for less engineering. It calls for more. Just a different kind.