upvote
That's what I'm thinking too. There is a lot of noise and I know teams where the majority of the people writing Python just have no idea what they're doing.

I'm working with Clojure which is used mostly by senior engineers and it still blows my mind how well Claude writes software in it even though it's a fringe language. It's even able to pick up in-house DSLs written with macros.

reply
Having used Python on and off for 20 years, my experience with LLMs writing Python has been mixed. I don’t think that’s necessarily because of a low-quality dataset, but rather because Python’s applications are so broad and the language has gone through several paradigm shifts over time: sync vs. async, typed vs. untyped, scientific Python looking very different from web application code, some people really wishing it were an FP language, and others doing the clean-architecture OOP onion soup. It has gotten so fragmented.

Recently, I had a more pleasant experience using LLMs with Go. It reminds me a bit of Python 2.x, when the community seemed, in my view, more focused on embracing a stupid simple language, with everyone trying to write roughly similar "Pythonic" code.

reply
> Having used Python on and off for 20 years, my experience with LLMs writing Python has been mixed. I don’t think that’s necessarily because of a low-quality dataset, but rather because Python’s applications are so broad and the language has gone through several paradigm shifts over time

If there’s one language that is the prime example of this, it’s C++, and according to this benchmark it ranks incredibly high.

I’m also thoroughly confused why Kimi 2.6 scores 83% while Opus 4.7 scores 67% for C++, GPT5.5 isn’t even in the top10.

Gemma 4 31B scores 100% success rate for Python (!!) while Opus 4.6 only 65%.

This benchmark really seems to be all over the place and doesn’t make sense.

reply
That was the hardest part of learning PHP, all the code examples online were just awful.
reply
Worked on a PHP project once. Every time I asked why something was done a certain way the answer was "dunno, we copy pasted this code snippet."

Certain popular PHP codebases appear to use a similar methodology.

reply
I was (pleasantly) surprised by Claude Code doing Raku - also with a limited training set (~2000 Stack Overflow, a bunch of Rosetta, 2,500 modules). I put this down to the quality of the code for the core community who are all frankly uber-gremlins.
reply
Yeah Raku feels so expressive and lovely to me with the help of an AI assistant. I've only done toy programs and scripts with it but it is actually so nice.
reply
Reminds me of the time I asked Claude to write some Wordpress code for me. The results were…rough.
reply
All my vibe coded projects (personal) are Go backend services, with Typescript/React frontend. And my thoughts were based on similar things. Like why I wouldn't use PHP for that, either.
reply
There's a broken idea that AI know Python because they're written in Python.

Not how any of it works.

reply
Not what anyone was talking about. Training corpus ≠ inference engine.
reply
While recent models are capable of generalizing to any language at this point, I do think there are weights from their pretraining corpus that still leak through into how they create their responses. We observed similar language performance patterns across models from different providers, btw.
reply