upvote
> Rust is perfect for writing all of code using LLM. It's strict type system makes is less likely to make very dumb mistakes that other languages might allow.

I question this. Yes, strong enforcement of invariants at compile time helps the LLM generate functional code since it gets rapid feedback and retraces as opposed to generating buggy code that fails at runtime in edge cases.

On the other hand, Rust is a complex language prone to refactoring avalanches, where a small change in a component forces refactoring distant code. If the initial architecture is bad or lacking, growing the code base incrementally as LLMs typically do will tend towards spaghettification. So I fear a program that compiles and even runs ok, but no longer human readable or maintainable.

reply
> Rust is a complex language prone to refactoring avalanches

This may be so, but LLMs are great at slogging through such tedious repercussions.

I would say if the language prevents sloppy intermediate states, that actually makes it more amenable to AI; if you just half-ass a refactor into a conceptually inconsistent state, it’s possible for bad tests to fail to catch it in Python, say. But if many such incomplete states are just forbidden, then the compiler errors provide a clean objective function that the LLM can keep iterating on.

reply
This is true in my experience as well. I'd even say it's the most common failure mode of current AI! It "fixes" some problem locally and declares victory, but it doesn't fully address the consequences of the change everywhere, and then the codebase is inconsistent.
reply
I’ve seen Claude address the consequences of a change in a way that honestly was more comprehensive than I would be capable of. But I still agree that sometimes it misses the mark. I think that may be due to “adaptive effort “ which Claude used now by default.
reply
> On the other hand, Rust is a complex language prone to refactoring avalanches, where a small change in a component forces refactoring distant code.

Are you saying this out of personal experience or just hypothesizing? I am working on a large, complex rust project with Claude Code and do not experience this at all.

reply
It can happen like this:

- write sleek operator-overloading-based code for simple mathematical operations on your custom pet algebra

- decide that you want to turn it into an autograd library [0]

- realise that you now need either `RefCell` for interior mutability, or arenas to save the computation graph and local gradients

- realise that `RefCell` puts borrow checks on the runtime path and can panic if you get aliasing wrong

- realise that plain arenas cannot use your sleek operator-overloaded expressions, since `a + b` has no access to the arena, so you need to rewrite them as `tape.sum(node_a, node_b)`

- cry

This was my introduction to why you kinda need to know what you will end up building with Rust, or suffer the cascade refactors. In Python, for example, this issue mostly wouldn't happen, since objects are already reference-like, so the tape/graph can stay implicit and you just chug along.

I still prefer Rust, just that these refactor cascades will happen. But they are mechanically doable, because you just need to 'break' one type, and let an LLM correct the fallout errors surfaced by the compiler till you reach a consistent new ownership model, and I suppose this is common enough that LLM saw it being done hundreds of times, haha.

[0] https://github.com/karpathy/micrograd

reply
You can still use the fancy operators for readability, just use a macro to translate them into the actual code. Very common pattern in non-trivial Rust libraries.
reply
I also work on a large complex rust project (>1M LOC) with extensive use of Claude Code. It is very consistent with my experience. Claude frequently subverts the obvious intent of the system - whether that's expressed in comments or types - in the pursuit of "making the build green", as it so often puts it. It, like many junior engineers, has completely failed to internalize the lesson that type errors are useful information and not a bad thing to make go away as soon as possible. It is remarkably capable, but you cannot trust it to have good taste.
reply
This post has some good examples of this sort of problem: https://loglog.games/blog/leaving-rust-gamedev/
reply
That link reads like an autobiography about his love affair with Rust and subsequent breaking up after pushing the relationship a step too far: into gaming. He has been using Rust much, much longer than me, but I rekcon I already hit most of the pain points he mentions. (And I notice he left some things out, like async.)

I've come away feeling that most it looks fixable - but it won't be fixed in Rust. Some of the language choices (like favouring monomorphization to the point of making dll's near impossible) are near impossible to undo now, and in other cases where it might conceivably be fixed (like async) it won't be because the community is too invested with their current solution.

So we are stuck with the Rust we have; warts and all. That blog post convinced me those warts mean the language should be avoided for game development. Similarly sqlite developers convinced me the current state of Rust tooling meant it wasn't a good fit for their style of high reliability coding, so they are sticking with C. Which is a downright perverse outcome.

But for most of us C programmers who aren't willing to put in the huge effort Sqlite does to get the reliability up, Rust is the only game in town right now. It's the first and currently only language to implement a usable formal proof checker that eliminates most of the serious footguns in C and C++. But I am now hoping it becomes a victim of the old engineering adage: plan to throw the first one away, because you will anyway.

reply
It's very easy to just instruct the LLM to build using isolated crates, to maintain boundaries, focus on "ports and adapters", etc, and not run into this - in my experience.

I haven't had any issues with this getting out of hand on >10KLOC vibed rust codebases.

reply
From the languages that I know, Rust is the only language that I can look at a multi-threaded code and understand it. This stuff being checked by the compiler is a huge advantage
reply
I only used Rust for fun maths projects crunching billions of numbers (else python is easier for me), but I have to say rayon is the most amazing multi-processing experience I've ever had!
reply
> I haven't had any issues with this getting out of hand on >10KLOC vibed rust codebases.

This rewrite is >750k lines of Rust

reply
I don't see any reason why the approach wouldn't hold just fine, if not better, as the codebase scaled. Indeed this appears to be exactly what the author has done, they mention that they made heavy use of crates.
reply
When Microsoft rewrote it in go, there was a comment from one of the leads that they chose it over rust because of the similarity in paradigms (garbage collection, etc), and that using rust would've been more difficult, requiring a lot of "hoop jumping". Now that you've done it... Thoughts?
reply
Yes indeed. More than 1 million lines of code (including tests) is jumping lots of hoops but with LLMs it's not as painful so you can just ask it to do the hard things.

Example of a Claude Code session after 2 hours of "Crunching" that came out without results https://github.com/mohsen1/tsz/pull/4868 (Edit I force pushed to PR to solve the problem, you can see the initial refuse message in the initial version of PR description)

Funny thing is, the last percent of the test have been so hard to work on that Opus 4.7 routinely bails and says "it's too involved or complicated" so I had to add prompts specifically asking it not to bail.

reply
You should try GPT, I’d be really interested to hear if it works better. (Exclusively using GPT for systems work at $DAYJOB, but compare with opus every couple weeks and GPT consistently gives me better results)
reply
I've been comparing Claude vs Codex using GPT and Claude consistently is better than GPT about reasoning, about writing code, and using the tools as appropriate.

GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

GPT also left me with broken tests/code that I had to iterate on manually, Claude is much better about reasoning through code. Primarily Python.

reply
> GPT for instance had a lot of issues using git worktrees, and didn't understand how to correctly use it to then merge stuff back into a main branch, vs Claude which seems to do this much more naturally.

I wonder how much of that is due to the model being somehow better, or the harness having built-in instructions on how to use them.

I've used worktrees with Codex just fine, but I instructed it to use my scripts for setting it up and tearing it down. The scripts also reflinked existing compilation artifacts to speed up compiling and allocated a fresh db instance for it, but then also applied a simple protocol for locking the master repository during merges, so multiple agents wouldn't try to merge at the same time. It has been following those instructions quite well.

reply
deleted
reply
OpenAI gave me that 10x boost and used it all already for this week. I'm guessing the last 50 tests is only doable by GPT 5.5 xhigh
reply
Do you have any write ups on your workflow with Claude and github dev?
reply
That might be opus 4.7 behaviour because I also get that all the time in the past few weeks. Also complex code base, but likely an order of magnitude simpler than yours.
reply
deleted
reply
They mentioned that they wanted to port their compiler over to retain existing behavior (vs a re-write) and Rust has a hard time with their cyclic data structures.
reply
Is GC useful for a static type checker? Or did they make a new runtime?
reply
The point is that having a GC will affect your data structure and algorithm design, so it’s easier to automatically transform JS or TS to Go than to rust because you’re mostly reducing things down to one problem (translation) rather than multiple intertwined problems.
reply
tyscript compiler is a cli tool. and is run for short periods of time. GC collection and memory leaks should be least of issue to look for
reply
Same but for multi-threaded Postgres[0]. 96% pg regression tests pass after 1 month and 823K LOC. 8 Codex accounts at $200/mo is what i could use up with no Mythos

I've also seen the benefits of Rust for this too. And making the bet that my pg experience will help me make good design choices around many of the things people have been having trouble with in pg for a long time[1]. Excited to see AI make it more possible to improve complex pieces of software than has historically been practical.

[0] https://github.com/malisper/pgrust [1] https://malisper.me/the-four-horsemen-behind-thousands-of-po...

reply
Very cool! If you have extra tokens laying around ask the agent try to break things and open GitHub issues. This is what I do for tsz and beyond conformance test I can see it finding very good bugs.
reply
1600/mo, there is now a token-rich class.
reply
96% tests passing sounds impressive, but I remember that C compiler that had similar (or better) stats yet was still hilariously broken because the test suite didn't cover many "obvious" things that a human wouldn't get wrong even without the tests.
reply
wow!

curious about your workflow for running all these accounts. different harnesses in parallel? manually switching in codex? 5.5pro only?

what works for you?

reply
I wrote up a bit about my workflow here[0][1]. I'm using conductor.build to manage multiple codex sessions at once. When I hit the rate limit, I'm using codex-auth[2] to switch codex accounts.

[0] https://malisper.me/pgrust-rebuilding-postgres-in-rust-with-... [1] https://malisper.me/pgrust-update-at-67-postgres-compatibili... [2] https://github.com/loongphy/codex-auth

reply
Rust is amazing, but the way I want to build Rust software breaks down on large projects with LLMs. Maintaining clean boundaries or even just establishing them stops being a flow state and turns into painful reviews that push me into procrastination mode.
reply
I’ve struggled to get Opus to not write the weirdest possible Rust, ignoring all idioms and so on. Any tips?
reply
Be absolutely ruthless with technical debt. Opus is perfectly capable of producing idiomatic code in any mainstream language you please, but will seize on any opportunity to justify writing basically-python instead because that's "consistent" with the "convention". Deprive it of that excuse.
reply
Give it coding guidelines. It'll largely try to do what you ask.

Left to itself, it often follows human developers who conceive of their goal as "get the program working, the end justifies the means." Which makes sense because there are a lot of systems like that in the training corpus.

reply
Wow, amazing work.

Pretty impressive that it is faster than the Go version already.

reply
Thank you!

It's much faster in single file benchmarks (3 to 5x)

https://tsz.dev/benchmarks/micro

I have optimizations planned for large projects that I'm still flushing out.

reply
Regarding the architecture documentation you have up on tsz.dev, one thing that jumped out to me was the use of the per node typed side pools. A semi-recent talk[0] had benchmarked this and found it to be a deoptimisation: he couldn't explain it, but an audience member suggested it is likely because an AST is not generally very type-homogenous in its visit order. After a CallExpr node the next node to visit is probably not a CallExpr but more probably an Identifier etc, so storing the node "extra data" in separate pools makes them more likely to be cold in cache rather than hot.

In Nova JavaScript engine[1] I've done exactly as you've done and split objects into typed side pools (I call them "(typed) heap vectors") but in a JavaScript engine my _hypothesis_ is that the visitation patterns are much more amenable to this: an Array, Set, or Map is more likely to be homogeneous than heterogeneous, and therefore a loop over the contents is likely going to hit the same side pool for each entry.

[0]: https://www.youtube.com/watch?v=s_1OG9GwyOw [1]: https://trynova.dev/

reply
Zig is much more type aligned to bun than typescript. And there’s a common interface of C ffi so you could imagine porting it modularly and keeping the test suite in zig
reply
shouldn't typed code that uses functional style be kinda the perfect end game for llms? You can parallelize generation at any granularity, easily ring fence changes, reproduce everything, types give clues to the llm.
reply
>Rust is perfect for writing all of code using LLM.

Rust is a terrible language for using LLMs to write code if Rust's low latency isn't needed, because of its extreme compile times. LLMs code faster than humans so a far bigger fraction of the time is spent waiting for the compiler, and a reasonably sized project will take literally 10x longer to compile in Rust than in e.g. Zig or Go.

reply
[flagged]
reply
> How do we know it is true?

The branch is open.

You can check it out and run the tests if you don’t believe it.

reply
Zig isn’t so much on the blacklist because of the culture it carries from its maintainers, but because the ecosystem is no longer easily composed with other GitHub projects/GitHub Actions.
reply
> We are dealing with a company of habitual liars and promoters.

Any sources to back this up?

reply