AI coding tools are agreeable to a fault


I use Copilot at work, typically with Claude Sonnet or Opus as the model. Both have made me faster. Both have also, on more than one occasion, enthusiastically helped me do the wrong thing.

Here are the two patterns I kept running into and what I did about them.

They won’t tell you your idea is bad

The first thing I noticed was that these models don’t push back. You propose a solution, they help you implement it. You propose a worse solution, they still help you implement it. Bug fixes, new features, refactoring. It didn’t matter. If I came in with a direction already in mind, the model would run with it.

This is a design choice, not a bug. These tools are optimised to be helpful, and pushing back on the user doesn’t feel helpful. But in practice it means you can confidently head in the wrong direction with an AI happily following along.

The fix I settled on was to give explicit instructions upfront: be brutally honest, assess my suggestions against industry standards, best practices, maintainability, and readability. Tell me if my proposed approach isn’t the right one. Once I made that explicit, the quality of the conversation changed significantly. The model would actually flag when my suggestion was suboptimal and explain why.

It shouldn’t require that instruction, but it does. Now it’s the first thing I set up in any serious working session.

They don’t investigate what they weren’t asked about

The second pattern was subtler and took longer to notice. Given a complex codebase and a problem that affects a small part of it, the models would focus on that small part and only that part. Their suggestions were technically correct in isolation but often missed how the affected code connected to everything else.

The solutions looked fine until you considered the broader system. Then they started to look like patches that would cause problems somewhere else.

Again, the fix was explicit: ask the model to investigate how the affected scope interacts with the rest of the codebase before proposing anything. And then ask it to rate its confidence in the proposed solution and highlight anything it’s uncertain about.

That second part, the confidence check, turned out to be the more valuable of the two. When asked to assess its own certainty, the model would drop from “here is the solution” to “here is what I think, but I’m not confident about X and Y, and you should check Z.” That’s a much more useful output.

The common thread

Both of these come down to the same thing: these tools do what you ask and nothing more. They won’t volunteer criticism, and they won’t go looking for context you didn’t point them at.

Once you know that, you can account for it. But it isn’t obvious from the start, and I don’t think it gets talked about enough.