Just asking "Explain what this service does?" turns into
[No response for three minutes...]
+729 -522
"NEVER REMOVE LOGGING OR DEBUGGING INFO. If unsure, bias towards introducing sensible logging."
Or just
"NEVER REMOVE LOGGING OR DEBUGGING INFO."
Because your coworkers definitely are, and we're stack ranked, so it's a race (literally) to the bottom. Just send it...
(All this actually seems to do is push the burden on to their coworkers as reviewers, for what it's worth)
Edit: obviously inside something so it doesn't have access to the rest of my system, but enough access to be useful.
People that don't put out slop, mostly.
What I don't have time to do is debug obvious slop.
Built-in approval thing sounds like a good idea, but in practice it's unusable. Typical session for me was like:
About to run "sed -n '1,100p' example.cpp", approve?
About to run "sed -n '100,200p' example.cpp", approve?
About to run "sed -n '200,300p' example.cpp", approve?
Could very well be a skill issue, but that was mighty annoying, and with no obvious fix (options "don't ask again for ...." were not helping).Every one of these models is so great at propelling the ship forward, that I increasingly care more and more about which models are the easiest to steer in the direction I actually want to go.
Codex is very steerable to a fault, and will gladly "monkey paw" your requests to a fault.
Claude Opus will ignore your instructions and do what it thinks is "right" and just barrel forward.
Both are bad and papering over the actual issue which is these models don't really have the ability to actually selectively choose their behavior per issue (ie ask for followup where needed, ignore users where needed, follow instructions where needed). Behavior is largely global
Overall, I think it's probably better that it stay focused, and allow me to prompt it with "hey, go ahead and refactor these two functions" rather than the other way around. At the same time, really the ideal would be to have it proactively ask, or even pitch the refactor as a colleague would, like "based on what I see of this function, it would make most sense to XYZ, do you think that makes sense? <sure go ahead> <no just keep it a minimal change>"
Or perhaps even better, simply pursue both changes in parallel and present them as A/B options for the human reviewer to select between.
This has not been my experience. I do Elixir primarily and Gemini has helped build some really cool products and massive refactors along the way. And it would even pick up security issues and potential optimizations along the way
What HAS been an issue constantly though was randomly the model will absolutely not respond at all and some random error would occur which is embarrassing for a company like Google with the infrastructure they own.
Not like human programmers. I would never do this and have never struggled with it in the past, no...
That helped quite a bit but it would still go off on it's own from time to time.
You can make their responses fairly dry/brief.
There is a tradeoff though, as comments do consumer context. But I tend to pretty liberally dispense of instances and start with a fresh window.
Yeah, that sounds worse than "trying to helpful". Read the code instead, why add indirection in that way, just to be able to understand what other models understand without comments?
Be a proactive research partner: challenge flawed or unproven ideas with evidence; identify inefficiencies and suggest better alternatives with reasoning; question assumptions to deepen inquiry.