The AI Code Quality Study Nobody Wants to Talk About

ByAhmed essyadJanuary 20264 min read

—

The AI Code Quality Study Nobody Wants to Talk About

José Mourinho once said about Messi: "Lionel Messi is like a good porn movie, everybody likes it but in public they deny it."

That's AI coding tools in 2025.

Let's be real about what this study actually shows: they analyzed 470 PRs (320 AI-labeled, 150 human-labeled) and found AI code had 1.7x more issues. But here's what they conveniently glossed over:

The Methodology is Completely Fucked

They're relying on PRs being "labeled" as AI vs human. You know what that means? It means they're trusting that people accurately disclosed when they used AI. In reality, the "human-only" PRs are probably full of Claude and GPT code where someone just hit "regenerate" until it stopped saying "Here's the updated code" at the top, then removed the telltale AI comments.

Half the developers I know use AI constantly but would never label their PR as "AI-generated" because of exactly this kind of stigma. They'll paste code into ChatGPT, get it working, clean up the obvious AI patterns, and commit it as their own. The "human baseline" in this study is almost certainly contaminated with tons of unlabeled AI code.

Which means the actual difference between "honest about using AI" and "lying about using AI" is probably way smaller than 1.7x. Maybe the real finding here is that people who are transparent about their AI usage are the same people who ship faster and care less about obsessive polish before the PR goes up.

They Didn't Normalize for Velocity

AI generates way more code, way faster. If I write 10 lines manually and AI writes 100 lines in the same time, yeah, the AI code might have more total issues. But per unit of developer time? The math might look completely different. They didn't normalize for velocity or output volume.

Sample Size Theater

The 8x performance issues stat is wild. Performance problems appeared "nearly 8x more" in AI code—but they also admit performance issues were a tiny sample size overall. When your denominator is like 5 human cases and 40 AI cases in a 470-PR study, that multiplier becomes meaningless noise.

Confounding Variables Everywhere

"More critical issues" doesn't mean worse outcomes. Maybe AI is just more willing to attempt complex shit that has higher failure modes, while humans play it safe with boring CRUD operations. The report doesn't account for what problems were being solved or how ambitious the changes were. Or maybe people use AI for the hard stuff they don't know how to do, and write trivial code themselves.

Code Review Exists

Nobody's shipping AI code without review. The whole point is that humans review it. If AI code has "3x more readability issues," okay, cool—that's what code review is for. The human catches it before merge. The system is working as designed.

The Circular Logic Problem

And here's the biggest mindfuck: the humans writing that "human-only" code already learned from AI. Everyone's been using Copilot, ChatGPT, and Claude for years now. The patterns you learned, the problems you know how to solve, the architectures you understand—a lot of that came from AI-assisted learning. You think that senior engineer who claims they write everything by hand didn't debug their shit with ChatGPT last week? Come on.

The study is basically comparing "people who admit they use AI" vs "people who use AI but lie about it or don't label it." And somehow we're supposed to believe that's a meaningful comparison.

The Real Agenda

Look, I'm not saying AI code is perfect. Obviously it needs review. Obviously it makes mistakes. But a company that sells AI code review tools publishing a report about how dangerous AI code is feels a lot like a fire extinguisher company funding studies about how flammable everything is.

The real question isn't "does AI code have more issues per PR" but "does AI make developers more productive overall?" And the answer to that is pretty clearly yes, which is why 90%+ of developers are using it—including the ones claiming they don't.