[[{“value”:”

We often say AIs “understand” code, but they don’t truly understand your problem or your codebase in the sense that humans understand things. They’re mimicking patterns from text and code they’ve seen before, either built into their model or provided by you, aiming to produce something that looks right and is a plausible answer. It’s very often correct, which is why vibe coding (repeatedly feeding the output from one prompt back to the AI without reading the code that it generated) works so well, but it’s not guaranteed to be correct. And because of the limitations of how LLMs work and how we prompt with them, the solutions rarely account for overall architecture, long-term strategy, or often even good code design principles.

The principle I’ve found most effective for managing these risks is borrowed from another domain entirely: trust but verify. While the phrase has been used in everything from international relations to systems administration, it perfectly captures the relationship we need with AI-generated code. We trust the AI enough to use its output as a starting point, but we verify everything before we commit it.

Trust but verify is the cornerstone of an effective approach: trust the AI for a starting point but verify that the design supports change, testability, and clarity. That means applying the same critical review patterns you’d use for any code: checking assumptions, understanding what the code is really doing, and making sure it fits your design and standards.

Verifying AI-generated code means reading it, running it, and sometimes even debugging it line by line. Ask yourself whether the code will still make sense to you—or anyone else—months from now. In practice, this can mean quick design reviews even for AI-generated code, refactoring when coupling or duplication starts to creep in, and taking a deliberate pass at naming so variables and functions read clearly. These extra steps help you stay engaged with critical thinking and keep you from locking early mistakes into the codebase, where they become difficult to fix.

Verifying also means taking specific steps to check both your assumptions and the AI’s output—like generating unit tests for the code, as we discussed earlier. The AI can be helpful, but it isn’t reliable by default. It doesn’t know your problem, your domain, or your team’s context unless you make that explicit in your prompts and review the output carefully to make sure that you communicated it well and the AI understood.

AI can help with this verification too: It can suggest refactorings, point out duplicated logic, or help extract messy code into cleaner abstractions. But it’s up to you to direct it to make those changes, which means you have to spot them first—which is much easier for experienced developers who have seen these problems over the course of many projects.

Beyond reviewing the code directly, there are several techniques that can help with verification. They’re based on the idea that the AI generates code based on the context it’s working with, but it can’t tell you why it made specific choices the way a human developer could. When code doesn’t work, it’s often because the AI filled in gaps with assumptions based on patterns in its training data that don’t actually match your actual problem. The following techniques are designed to help surface those hidden assumptions, highlighting options so you can make the decisions about your code instead of leaving them to the AI.

Ask the AI to explain the code it just generated. Follow up with questions about why it made specific design choices. The explanation isn’t the same as a human author walking you through their intent; it’s the AI interpreting its own output. But that perspective can still be valuable, like having a second reviewer describe what they see in the code. If the AI made a mistake, its explanation will likely echo that mistake because it’s still working from the same context. But that consistency can actually help surface the assumptions or misunderstandings you might not catch by just reading the code.
Try generating multiple solutions. Asking the AI to produce two or three alternatives forces it to vary its approach, which often reveals different assumptions or trade-offs. One version may be more concise; another more idiomatic; a third more explicit. Even if none are perfect, putting the options side by side helps you compare patterns and decide what best fits your codebase. Comparing the alternatives is an effective way to keep your critical thinking engaged and stay in control of your codebase.
Use the AI as its own critic. After the AI generates code, ask it to review that code for problems or improvements. This can be effective because it forces the AI to approach the code as a new task; the context shift is more likely to surface edge cases or design issues the AI didn’t detect the first time. Because of that shift, you might get contradictory or nitpicky feedback, but that can be useful too—it reveals places where the AI is drawing on conflicting patterns from its training (or, more precisely, where it’s drawing on contradictory patterns from its training). Treat these critiques as prompts for your own judgment, not as fixes to apply blindly. Again, this is a technique that helps keep your critical thinking engaged by highlighting issues you might otherwise skip over when skimming the generated code.

These verification steps might feel like they slow you down, but they’re actually investments in velocity. Catching a design problem after five minutes of review is much faster than debugging it six months later when it’s woven throughout your codebase. The goal is to go beyond simple vibe coding by adding strategic checkpoints where you shift from generation mode to evaluation mode.

The ability of AI to generate a huge amount of code in a very short time is a double-edged sword. That speed is seductive, but if you aren’t careful with it, you can vibe code your way straight into classic antipatterns (see “Building AI-Resistant Technical Debt: When Speed Creates Long-term Pain”). In my own coding, I’ve seen the AI take clear steps down this path, creating overly structured solutions that, if I allowed them to go unchecked, would lead directly to overly complex, highly coupled, and layered designs. I spotted them because I’ve spent decades writing code and working on teams, so I recognized the patterns early and corrected them—just like I’ve done hundreds of times in code reviews with team members. This means slowing down enough to think about design, a critical part of the mindset of “trust but verify” that involves reviewing changes carefully to avoid building layered complexity you can’t unwind later.

There’s also a strong signal in how hard it is to write good unit tests for AI-generated code. If tests are hard for the AI to generate, that’s a signal to stop and think. Adding unit tests to your vibe-code cycle creates a checkpoint—a reason to pause, question the output, and shift back into critical thinking. This technique borrows from test-driven development: using tests not only to catch bugs later but to reveal when a design is too complex or unclear.

When you ask the AI to help write unit tests for generated code, first have it generate a plan for the tests it’s going to write. Watch for signs of trouble: lots of mocking, complex setup, too many dependencies—especially needing to modify other parts of the code. Those are signals that the design is too coupled or unclear. When you see those signs, stop vibe coding and read the code. Ask the AI to explain it. Run it in the debugger. Stay in critical thinking mode until you’re satisfied with the design.

There are also other clear signals that these risks are creeping in, which tell you when to stop trusting and start verifying:

Rehash loops: Developers cycling through slight variations of the same AI prompt without making meaningful progress because they’re avoiding stepping back to rethink the problem (see “Understanding the Rehash Loop: When AI Gets Stuck”).
AI-generated code that almost works: Code that feels close enough to trust but hides subtle, hard-to-diagnose bugs that show up later in production or maintenance.
Code changes that require “shotgun surgery”: Asking the AI to make a small change requires it to create cascading edits in multiple unrelated parts of the codebase—this indicates a growing and increasingly unmanageable web of interdependencies, the shotgun surgery code smell.
Fragile unit tests: Tests that are overly complex, tightly coupled, or rely on too much mocking just to get the AI-generated code to pass.
Debugging frustration: Small fixes that keep breaking somewhere else, revealing underlying design flaws.
Overconfidence in output: Skipping review and design steps because the AI delivered something that looks finished.

All of these are signals to step out of the vibe-coding loop, apply critical thinking, and use the AI deliberately to refactor your code for simplicity.

“}]] We often say AIs “understand” code, but they don’t truly understand your problem or your codebase in the sense that humans understand things. They’re mimicking patterns from text and code they’ve seen before, either built into their model or provided by you, aiming to produce something that looks right and is a plausible answer. It’s Read More AI & ML, Commentary

Trust but Verify Andrew Stellman AI & ML – Radar

Leave a Reply Cancel reply