undefined

points

[-]

Ultimately LLMs don’t really understand what the code does at runtime. Sure, just parsing out the codebase can help make a good guess but in some cases it’s hard to trust LLMs with changes because the consequences are unknown in complex codebases that have weird warts nobody documented.

Maybe in a generation or two codebases will become more uniform and predictible if fewer humans do it by hand. Same with self driving cars, if there were no human drivers out there the problem would become trivial to conquer.

by simonw325 days ago|

parent|

[-]

That's a lot less true today than it was six weeks ago. The "reasoning" models are spookily good at answering questions about how code runs, and identifying the source of bugs.

They still make mistakes, and yeah they're still (mostly) next token predicting machines under the hood, but if your mental model is "they can't actually predict through how some code will execute" you may need to update that.

by 324 days ago|

parent|

prev|

[-]

deleted

by LunaSea324 days ago|

prev|

[-]

Gemini 2.5 Pro crashes with a 50) status code every 5 requests. Not great for a model you're supposed to rely on.

by simonw324 days ago|

parent|

[-]

Yeah, there's a reason it still has "preview" and "experimental" in the model names.