The port had been done in a weekend just to see if we could use Python in production. The C++ code had taken a few months to write. The port was pretty direct, function for function. It was even line for line where language and library differences didn't offer an easier way.
A couple of us worked together for a day to find the reason for the speedup. Just looking at the code didn't give us any clues, so we started profiling both versions. We found out that the port had accidentally fixed a previously unknown bug in some code that built and compared cache keys. After identifying the small misbehaving function, we had to study the C++ code pretty hard to even understand what the problem was. I don't remember the exact nature of the bug, but I do remember thinking that particular type of bug would be hard to express in Python, and that's exactly why it was accidentally fixed.
We immediately started moving the rest of our back end to Python. Most things were slower, but not by much because most of our back end was i/o bound. We soon found out that we could make algorithmic improvements so much more quickly, so a lot of the slowest things got a lot faster than they had ever been. And, most importantly, we (the software developers) got quite a bit faster.
This was particularly true for one of the projects I've worked with in the past, where Python was chosen as the main language for a monitoring service.
In short, it proved itself to be a disaster: just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.
In the end, the project went ahead for a while more, and we had to do all sorts of mitigations to get the performance impact to be less of an issue.
We did consider replacing it all by a few open source tools written in C and some glue code, the initial prototype used few MBs instead of dozens (or even hundreds) of MBs of memory, while barely registering any CPU load, but in the end it was deemed a waste of time when the whole project was terminated.
The main lesson of the story. Just pick Python and move fast, kids. It doesn’t matter how fast your software is if nobody uses it.
Pure speculation, but I would guess this has something to do with a copy constructor getting invoked in a place you wouldn't guess, that ends up in a critical path.
Not because they are brilliant, but because they are pretty good at throwing pretty much all known techniques at a problem. And they also don't tire of profiling and running experiments.
Recently I tried Codex/GPT5 with updating a bluetooth library for batteries and it was able to start capturing bluetooth packets and comparing them with the libraries other models. It was indefatigable. I didn't even know if was so easy to capture BLE packets.
Crazy how many stories like this I’ve heard of how doing performance work helped people uncover bugs and/or hidden assumptions about their systems.
They found that they had fewer bugs in Python so they continued with it.
Meanwhile my experience has been that whenever there has been a performance issue severe enough to actually matter, it's often been the result of some kind of performance bug, not so much language, runtime, or even algorithm choices for that matter.
Hence whenever the topic of how to improve performance comes up, I always, always insist that we profile first.
But, of course, profiling is always step one.
I suspect it’s more likely to be something like passing std::string by value not realising that would copy the string every time, especially with the statement that the mistake would be hard to express in Python.
Would be kind of cool if e. g. python or ruby could be as fast as C or C++.
I wonder if this could be possible, assuming we could modify both to achieve that as outcome. But without having a language that would be like C or C++. Right now there is a strange divide between "scripting" languages and compiled ones.
I hit the flag button on the comment and suggest others do too.
I was not actually sure this one was a bot, despite LLM-isms and, sadly, being new. But you can look at the comment history and see.
It looks like neither is the "real win". both the language and the algorithm made a big difference, as you can see in the first column in the last table - going to wasm was a big speedup, and improving the algorithm on top of that was another big speedup.
edit wasn't Astral, but here's the blog post I was thinking of. https://nesbitt.io/2025/12/26/how-uv-got-so-fast.html
That said, your point is very much correct, if you watch or read the Jane Street tech talk Astral gave, you can see how they really leveraged Rust for performance like turning Python version identifiers into u64s.
It's also worth noting that unsafe Rust != C, and you are still battling these rules. With enough experience you gain an understanding of these patterns and it goes away, and you also have these realy solid tools like Miri for finding undefined behavior, but it can be a bit of a hastle.
Anyway, dubious claim since a Python interpreter will take 10s of milliseconds just to print out its version.
Do you have any evidence? I can point at techempower benchmarks showing IO bound tasks are still 10-100x faster in native languages vs Python/JS.
That is assuming Rust is 100x faster than Python btw, 49ms of I/O, 1ms of Rust, 100ms of Python.
> uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.
So the claim is not well supported at all by the article as you stated, in fact the claim is literally disproven by the article.
> uv is fast because of what it doesn’t do, not because of what language it’s written in.
The fact that the language had a small effect ("a bit") does not invalidate the statement that algorithmic improvements are the reason for the relative speed. In fact, there's no reason to believe that rust without the algorithmic version would be notably faster at all. Sure, "all" is an exaggeration, but the point made still stands in the form that most readers would understand it: algorithmic improvements are the important difference between the systems.
The specific claim I was responding to was that all of uv’s performance improvements come from algorithms rather than the language. My point was just that this is a stronger claim than what the article supports, the article itself says Rust contributes “a bit” to the speed, so it’s not purely algorithmic.
I do agree with the broader point that algorithmic and architectural choices are the main reason uv is fast, and I tried to acknowledge that, apparently unsuccessfully, in my very my first comment (“I don't doubt that a lot of uv's benefits are algo. But everything?”).
Thanks for cutting through the clickbait. The post is interesting, but I'm so tired of being unnecessarily clickbaited into reading articles.
One thing I noticed was that they time each call and then use a median. Sigh. In a browser. :/ With timing attack defenses build into the JS engine.
You still do get some latency from the event loop, because postMessage gets queued as a MacroTask, which is probably on the order of 10μs. But this is the price you have to pay if you want to run some code in a non-blocking way.
So this holds even for L = M. The speedup is not in the language, but in the rewriting and rethinking.
They say they measured that cost, and it was most of the runtime in the old version (though they don't give exact numbers). That cost does not exist at all in the new version, simply because of the language.
If they used raw byte structures, implemented the caching improvements on the wasm side, the copies might not be as bad.
But they still have an issue with multi-language stack: complexity also has a cost.
Python/C combo does not have this issue because you can work with Python types natively in C, but otherwise, this is a cross-language conversion issue, and not a Rust issue at all.
Edit: fixed phone typos
This new company chose a very confusing name that has been used by the Open UI W3C Community Group for over 5 years.
Open UI is the standards group responsible for HTML having popovers, customizable select, invoker commands, and accordions. They're doing great work.
Anyway, Javascript is no stranger to breaking changes. Compare Chromium 47 to today. Just add actual integers as another breaking change, then WASM becomes almost unnecessary.
I didn't mind reading articles that are not about how Rust is great in theory (and maybe practice).
That said, Rust does have real problems. Manual memory management sucks. People think GC is expensive? Well, keep in mind malloc() and free() take global locks! People just have totally bogus mental models of what drives performance. These models lead them to technical nonsense.
The most obvious approach would be to let LLMs generate code and render it but that introduces problems like safety, UI consistency and speed. OpenUI solves those problems and provides a safe, consistent and token optimized runtime for the LLMs to render live UI
That final summary benchmark means nothing. It mentions 'baseline' value for the 'Full-stream total' for the rust implementation, and then says the `serde-wasm-bindgen` is '+9-29% slower', but it never gives us the baseline value, because clearly the only benchmark it did against the Rust codebase was the per-call one.
Then it mentions: "End result: 2.2-4.6x faster per call and 2.6-3.3x lower total streaming cost."
But the "2.6-3.3x" is by their own definition a comparison against the naive TS implementation.
I really think the guy just prompted claude to "get this shit fast and then publish a blog post".
I understand your frustration with AI writing though. We are a small team and given our roadmap it was either use LLMs to help collate all the internal benchmark results file into a blog or never write it so we chose the former. This was a genuinely surprising and counterintuitive result for us, which is why we wanted to share it. Happy to clarify any of the numbers if helpful.
> converts internal AST into the public OutputNode format consumed by the React renderer
Why not just have the LLM emit the JSON for OutputNode ? Why is a custom "language" and parser needed at all? And yes, there is a cost for marshaling data, so you should avoid doing it where possible, and do it in large chunks when its not possible to avoid. This is not an unknown phenomenon.
Claude tells me this is https://www.fumadocs.dev/
Rust.
WASM.
TypeScript.
I am slowly beginning to understand why WASM did not really succeed.
So you're reinventing JSON but binary? V8 JSON nowadays is highly optimized [1] and can process gigabytes per second [2], I doubt it is a bottleneck here.
[1] https://v8.dev/blog/json-stringify [2] https://github.com/simdjson/simdjson
I don't think that's actually out yet, and more importantly, it doesn't change anything at runtime -- your code still runs in a JS engine (V8, JSC etc).
You can use it today.