upvote
The joke I hear is Claude Code will double your PRs

One PR from Claude. The next PR from you fixing Claude’s mistakes.

reply
Ha, pretty accurate in my experience. Though I'd say it's more like 1.5x the PRs - Claude does the initial PR, then you do half a PR fixing the subtle stuff it got wrong, and then you spend the other half wondering if you missed something.

The security fixes are the worst because the code looks correct. It's not like a typo you'd catch immediately - it's an auth check that works for 95% of cases but fails on edge cases the model never considered.

reply
I have seen the same Ai hallucinations that you mentioned: auth, input validation, error handling, non-existent dependencies, etc. It's tricky to get them all as LLM's have mastered the art of being "confidently wrong". What tools are you using to catch those issues? I feel current tooling is ill equiped for this new wave of Ai generated output.
reply
"Confidently wrong" is the perfect description. The code compiles, the tests pass (because the AI also wrote the tests to match), and the auth flow looks reasonable at first glance.

For catching these we layer a few things:

- Standard SAST (Semgrep, CodeQL) catches the obvious stuff but misses AI-specific patterns - npm audit / pip-audit for dependency issues, especially non-existent packages the AI hallucinates - Custom rules tuned for patterns we keep seeing: overly permissive CORS, missing rate limiting, auth checks that look correct but have subtle logic bugs - Manual review with a specific checklist for AI-generated code (different from our normal review checklist)

You're right that current tooling has a gap. Traditional scanners assume human-written code patterns. AI code looks structurally different - it tends to be more verbose but miss edge cases in ways humans wouldn't. We've been experimenting with scanning approaches specifically tuned for AI output.

The biggest wins have been simple: requiring all AI-generated auth and input validation code to go through a dedicated security reviewer, not just a regular code review.

reply