With other languages, whether it's TypeScript/Go/Python, even if you explicitly ask agents to write/run tests, after a while agents just forget to do that, unless they cause build failures. You have to constantly remind them to do that as the session goes. Never happens with Rust in my experience.
For many months now though, Claude is nearly consistent with both calling test and check/clippy. Perhaps this is due to my global memory file, not sure to be honest.
What i do know, is that i never use those hooks, i have them disabled atm. Why? Because the benefit is almost nonexistent as i mentioned, and the cost is at times, quite high. It means i cannot work on a project piecemeal, aka "only focus on this file, it will not compile and that's okay", and instead forces claude to make complete edits which may be harder to review. Worst of all, i have seen it get into a loop and be unable to exit. Eg a test fails and claude says "that failure is not due to my changes" or w/e, and it just does that.. forever, on loop. Burns 100% of the daily tokens pretty quick if unmonitored.
Fwiw i've not looked to see if there's an alternate way to write hooks. It might be worth having the hook only suggest, rather than forcing claude. Alternatively, maybe i could spawn a subagent to review if stopping claude makes sense.. hmm.
Maybe for your case you could create a /maybe-check command, and run that in the hook? Then specify the conditions under which a check/test is needed in there.
I am trying out building a toy language hosted on Haskell and it's been a nice combo - the toy language uses dependent typing for even more strictness, but simple regular syntax which is nicer for LLMs to use, and under the hood if you get into the interpreter you can use the full richness of Haskell with less safety guardrails of dependent typing. A bit like safe/unsafe Rust.
I haven't had this problem with Opus 4.5+ and Haskell. In fact, I get the opposite problem and often wish it was more capable of using abstractions.
- I can build SPAs with typescript and offload expensive operations to a rust implementation that targets wasm
- I can build a multi-platform bundled app with Tauri that uses TS for the frontend, rust for the main parts of the backend, and it can load a python sidecar for anything I need python for (ML stuff mainly)
- Haven't dived too much into games but bevy seems promising for making performant games without the overhead of using one of the big engines (first-class ECS is a big plus too)
It ended up solving the problem of wanting to use the best parts of all of these different languages without being stuck with the worst parts.
The big take away is that you can "patch" llms and steer them to correct answers in less trained programming languages, allowing for superior performance. Might work here. Not a clue how to implement, but stuff to llm-to-doc and the like makes me hopeful
- Rust: nearly universally compiles and runs without fault.
- Python,JS: very often will run for some time and then crash
The reason I think is type safety and the richness of the compiler errors and warnings. Rust is absolutely king here.
Not wanting to disagree, I am sure with Rust, it would be even more stable.
Does one get paid well to post these advertisements for Rust?
I hope there aren't many of your type on here.
Not to mention it's one of the slowest compilation of recent languages if not the slowest (maybe Kotlin).
Everything is a trade-off.
A half-assed type system is helpful for people writing code by hand. Then you get things like the squiggly lines in your editor and automated refactoring tools, which are quite beneficial for productivity. However, when an LLM is writing code none of that matters. It doesn't care one bit if the failure reports comes from the compiler or the test suite. It is all the same to it.
not born out by evidence. rust is bottom-mid tier on autocoderbenchmark. typescript is marginally bettee than js
shifting to compile time is not necessarily great, because the llm has to vibe its way through code in situ. if you have to have a compiler check your code it's already too late, and the llm does not havs your codebase in its weights, a fetch to read the types of your functions is context expensive since it's nonlocal.
If you're running good agentic AI it can read the compile errors just like a human and work to fix them until the build goes through.
Any side effect has to be performed inside `IO<T>` type, which means impure functions need to be marked as `IO<T>` return. And any function that tries to "execute" `IO<T>` side effect has to mark itself as returning `IO<T>` as well.
You basically compose a description of the side effects and pass this value representing those to the main handler which is special in that it can execute the side effects.
For the rest of the codebase this is simply an ordinary value you can pass on/store etc.
I've been successful with each, I think there's positives and negatives to both, just wanted to mention that particular one that stands out as making it relatively more pleasant to work with.
Let's set aside the fact that Go is a garbage collected language while Rust is not for now...
Do you prefer to let LLM reason about lifetimes, or debugging subtle errors yourself at runtime, like what happens with C++?
People who are familiar with the C++ safety discussion understand that lifetimes are like types -- they are part of the code and are just as important as the real logic. You cannot be ambiguous about lifetimes yet be crystal clear about the program's intended behavior.
Of course there are types where this is not true (file handlers, connections, etc), and managed languages usually don't have as good features to deal with these as CPP/Rust (raii).
As a human I can just decide to write quality code (or not!), but LLMs don't understand when they're being lazy or stupid and so need to have that knowledge imposed on them by an external reviewer. Static analysis is cheap, and more importantly it's automatic. The alternative is to spend more time doing code review, but that's a bottleneck.
I suspect the providers started training specifically in it because it appeared proportionally much more in the actual LLM usage (obviously much less than more mainstream languages like Python or JavaScript, but I wouldn't be surprised if there was more LLM queries on Rust than on C, for demographic reasons).
Nowadays even small Qwens are decent at it in one-shot prompts, or at least much better than GPT-4 was.
It's actually rare to have to borrow something and keep the borrow in another object (is where lifetime happens), most (95% at least I'd say) of the time you borrow something and then drop the borrow, or move the thing.
Lifetimes are a global property and LLMs are not particularly good at reasoning about them compared to local ones.
Most applications don't need low level memory control, so this complexity is better pushed to runtime.
There are lots of managed languages with good/even stronger type systems than Rust, paired with a good modern GC.
Huh? Lifetime analysis is a local analysis, same as any other kind of type checking. The semantics may have global implications, but exposing them locally is the whole point of having dedicated syntax for it.
That's what the compiler is doing.
The developer (or LLM) is supposed to do the global reasoning so that what they end up writing down makes semantic sense.
Sure, throwing a bunch of variants at it and see what sticks is certainly an approach, but "lifetimes check out" only proves that the resulting code will be memory safe, not that it actually makes sense.
I wouldn't use it for the galaxy brain libraries or explorations I like to do for my blog but for production Haskell Opus 4.5+ is really good. No other models have been effective for me.
- Rust code generates absolutely perfectly in Claude Code.
- Rust code will run without GC. You get that for free.
- Rust code has a low defect rate per LOC, at least measured by humans. Google gave a talk on this. The sum types + match and destructure make error handling ergonomic and more or less required by idiomatic code, which the LLM will generate.
I'd certainly pick Rust or Go over Python or TypeScript. I've had LLMs emit buggy dynamic code with type and parameter mismatches, but almost never statically typed code that fails to compile.
In this benchmark, models can correctly solve Rust problems 61% on first pass — A far cry from other languages such as C# (88%) or Elixir (a “buggy dynamic language”) where they perform best (97%).
I wonder why that is, it’s quite surprising. Obviously details of their benchmark design matter, but this study doesn’t support your claims.