undefined

points

[-]

I like biasing it towards the fact that there is a bug, so it can't just say "no bugs! all good!" without looking into it very hard.

Usually I ask something like this:

"This code has a bug. Can you find it?"

Sometimes I also tell it that "the bug is non-obvious"

Which I've anecdotally found to have a higher rate of success than just asking for a spot check

by majormajor8 hours ago|

parent|

[-]

Do you not run into too many false positives around "ah, this thing you used here is known to be tricky, the issue is..."

I've seen that when prompting it to look for concurrency issues vs saying something more like "please inspect this rigorously to look for potential issues..."

by cmrdporcupine8 hours ago|

parent|

[-]

What's more useful is to have it attempt to not only find such bugs but prove them with a regression test. In Rust, for concurrency tests write e.g. Shuttle or Loom tests, etc.

by majormajor7 hours ago|

parent|

[-]

It would be generally good if most code made setting up such tests as easy as possible, but in most corporate codebases this second step is gonna require a huge amount of refactoring or boilerplate crap to get the things interacting in the test env in an accurate, well-controlled way. You can quickly end up fighting to understand "is the bug not actually there, or is the attempt to repro it not working correctly?"

(Which isn't to say don't do it: I think this is a huge benefit you can gain from being able to refactor more quickly. Just to say that you're gonna short-term give yourself a lot more homework to make sure you don't fix things that aren't bugs, or break other things in your quest to make them more provable/testable.)

by simulator5g6 hours ago|

parent|

[-]

That is an unfortunate case you described, but also, git gud and write tests in the first place so you don't need to refactor things down the road.

by Nition11 hours ago|

parent|

prev|

[-]

Just in case you didn't read the full article, this is how they describe finding the bugs in the Linux kernel as well.

Since it's a large codebase, they go even more specific and hint that the bug is in file A, then try again with a hint that the bug is in file B, and so on.

by kgwxd2 hours ago|

parent|

prev|

[-]

> so it can't just say "no bugs! all good!"

If anyone, or anything, ever answers a question like that, you should stop asking it questions.

by jiggawatts7 hours ago|

parent|

prev|

[-]

As a meta activity, I like to run different codebases through the same bug-hunt prompt and compare the number found as a barometer of quality.

I was very impressed when the top three AIs all failed to find anything other than minor stylistic nitpicks in a huge blob of what to me looked like “spaghetti code” in LLVM.

Meanwhile at $dayjob the AI reviews all start with “This looks like someone’s failed attempt at…”

by 9dev45 minutes ago|

prev|

[-]

I usually do several passes of "review our work. Look for things to clean up, simplify, or refactor." It does usually improve the quality quite a lot; then I rewind history to before, but keep the changes, and submit the same prompt again, until it reaches the point of diminishing returns.

by justinclift4 hours ago|

prev|

[-]

> It spots threading & distributed system bugs that would have taken hours to uncover before, and where there isn't any other easy tooling.

Go has a built in race detector which may be useful for this too: https://go.dev/doc/articles/race_detector

Unsure if it's suitable for inclusion in CI, but seems like something worth looking into for people using Go.

by wat100009 hours ago|

prev|

[-]

You just have to be careful because it will sometimes spot bugs you could never uncover because they’re not real. You can really see the pattern matching at work with really twisted code. It tends to look at things like lock free algorithms and declare it full of bugs regardless of whether it is or not.

by dvfjsdhgfv18 hours ago|

prev|

[-]

> Pasting a big batch of new code and asking Claude "what have I forgotten? Where are the bugs?"

It's actually the main way I use CC/codex.

by petesergeant18 hours ago|

parent|

[-]

I find Codex sufficiently better for it that I’ve taught Claude how to shell out to it for code reviews

by linsomniac17 hours ago|

parent|

[-]

Ditto, I made a "/codex-review" skill in Claude Code that reviews the last git commit and writes an analysis of it for Claude Code to then work. I've had very good luck with it.

One particularly striking example: I had CC do some work and then kicked off a "/codex-review" and while it was running went to test the changes. I found a deadlock but when I switched back to CC the Codex review had found the deadlock and Claude Code was already working on a fix.

by cmrdporcupine8 hours ago|

parent|

prev|

[-]

I think OpenAI has actually released an official version of exactly this: https://community.openai.com/t/introducing-codex-plugin-for-...

https://github.com/openai/codex-plugin-cc

I actually work the other way around. I have codex write "packets" to give to claude to write. I have Claude write the code. Then have Codex review it and find all the problems (there's usually lots of them).

Only because this month I have the $100 Claude Code and the $20 Codex. I did not renew Anthropic though.

by motbus312 hours ago|

parent|

prev|

[-]

Yeah and it comes with the blood of children included

by vaginaphobic17 hours ago|

parent|

prev|

[-]

[dead]

by slig16 hours ago|