undefined

points

by djyde15 hours ago |

comments

by sleepyeldrazi5 hours ago|

[-]

I ran 3 prompts (short versions, full version in the repo):

- Implement a numerically stable backward pass for layer normalization from scratch in NumPy.

- Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).

- Implement an efficient KV-cache system for autoregressive transformer inference from scratch.

and tested Qwen3.6-27B (IQ4_NL on a 3090) against MiniMax-M2.7 and GLM-5 with kimi k2.6 as the judge (imperfect, i know, it was 2AM). Qwen surpassed minimax and won 2/3 of the implementations again GLM-5 according to kimi k2.6, which still sounds insane to me. The env was a pi-mono with basic tools + a websearch tool pointing to my searxng (i dont think any of the models used it), with a slightly customized shorter system prompt. TurboQuant was at 4bit during all qwen tests. Full results https://github.com/sleepyeldrazi/llm_programming_tests.

I am also periodically testing small models in a https://www.whichai.dev style task to see their designs, and qwen3.6 27B also obliterated (imo) the other ones I tested https://github.com/sleepyeldrazi/llm-design-showcase .

Needless to say those tests are non-exhaustive and have flaws, but the trend from the official benchmarks looks like is being confirmed in my testing. If only it were a little faster on my 3090, we'll see how it performs once a DFlash for it drops.

by __s14 hours ago|

prev|

[-]

Basic triage is good. I've found I need to mostly handle programming, but local models have been good for pointing me at where to look with just "investigate https://github.com/HarbourMasters/Shipwright/issues/6232" as prompt