False.
0 test files were deleted. 0 pre-existing tests were skipped, todo’d, or had assertions removed. 5 new tests were added in test.skip/test.todo state to track known not-yet-fixed bugs in the port that lacked test coverage before.
The merge changed 28 test files in total.
+1,312 lines
−141 lines
Most of that +1,312 is new tests.
The depth-of-recursion tests for TOML/JSONC parsers went from 25_000 -> 200_000 because Rust’s smaller stack frames (LLVM lifetime annotations let the optimizer reuse stack slots) mean 25k levels no longer reaches the 18 MB stack on Windows.
What is "most of that "?
Why did you feel the need to produce so much detail about a single category of tests?
It's too bad you haven't structured the commits and pull requests a bit differently so that it's easier to review the exact changes, but I hope it goes well.
For example doing the test refactorings in a first pull request, and using something like test.xfail that is first fails then after the merge succeeds (but the test code itself doesn't change).
Also I have seen some tests getting stricter, which is again not a problem, but separating to a different pull request would have improved the reviewability significantly for a runtime that many people and companies depend on.
I'm sorry you were downvoted by HN and your comment got ,,dead'', that's not the way to review things.
https://github.com/oven-sh/bun/pull/30412/changes/68a34bf8ed...
This is great! Just add a random sleep(1) to a test, don't worry about it, it's going to be fine!
Strange test though either way.
Not sure if these decisions were made by the LLM, but I've always felt that Claude is more prone to doing "shady stuff" like modifying tests than finding correct solutions to problems.
GPT/Codex is more honest in this regard.
Having said that, after looking at some of the test changes, they seem to be minor things, like changing timeouts, not changing the actual intended semantics of the tests. But it's too much code to review everything, so I might be completely wrong about that, and in real-world usage, even minor changes like these will cause issues.
Wow, This is definitely quite something for sure.
Can jarred comment about if he has read the commits or not too or respond to your comment, this has basically made me lose the small faith I had in what bun is doing if it turns out to be correct.
I'm happy it's not a project I'm depending on, but a large enough project had to try this at some point so that we all can learn from how it goes.
I think this is why Antropic bought bun, so that they can sell big code translation as a feature for all the banks with COBOL code that they want to get rid of for a long time.
Still, those banks / enterprises won't appreciate the number of unit test changes.
And I agree with another comment that Codex xhigh is much better for these kinds of tasks, but still hard on this kind of scale.
The MR is right there, linked at the top of this page. You can check who is telling the truth.
That said, I don't know how anyone is actually claiming to have done that. All day, the size of the MR makes the diff take too long to load and GitHub dies. I'll have to pull it later to check myself.
I'm convinced the future of writing code is heavily LLM assisted
[0] https://tsz.dev