upvote
I am not sure you set it up right. Did you have a runnable WolframLanguage file so it can compare results? Did you give it H100 / H200 access to compile and then iterate?

My experience is that once you have these two, it does amazing kernel work (Codex-5.4).

reply
> Did you have a runnable WolframLanguage file so it can compare results?

Yes.

> Did you give it H100 / H200 access to compile and then iterate?

Yes via Lambda.ai. Also, FWIW, I run claude with --dangerously-skip-permissions and codex with the equivalent flag.

> it does amazing kernel work (Codex-5.4)

Specifically with WGMMA + TMA?

---

Once TMA gets involved both Claude and Codex spin endlessly until they dump TMA for a slower fallback.

I've observed this with Claude-Code having Opus 4.6 reasoning set to medium, high, and max; "adaptive thinking" enabled and disabled; and I've made sure to max-out thinking tokens.

I've also observed this with Codex GPT-5.4 in addition to GPT-5.3-Codex with reasoning efforts from medium to xhigh.

---

I've also observed this on the web, as mentioned in my OP, with GPT-5.4pro (Extended Pro), Gemini3-DeepThink, and Opus 4.6.

reply
That is informative, thanks! Yes, I observe the same thing as the model tends to give up (like you said, "dump TMA for a slower fallback") and needs active steering to get good results. But it indeed works further than one-shot from Chat interface and knows much more about profiling / kernel coding than these.
reply
It doesn't have to be anything so extreme as novel work. The frontier of models still struggle when faced with moderately complex semantics. They've gotten quite good at gluing dependencies together, but it was a rather disappointing nothingburger watching Claude choke on a large xterm project I tried to give him. Spent a month getting absolutely nowhere, just building stuff out until it was so broken the codebase had to be reset and he'd start over from square 1. We've come a long way in certain aspects, but honestly we're just as far away from the silver bullet as we were 3 years ago (for the shit I care about). I'm already bundling up for the next winter.
reply