upvote
I don't know if you're aware, but there is a demonstration of wget (a fellow "gnu utility", right?) being auto-translated to a memory-safe subset of C++ [1]. Because the translation essentially does a one-for-one substitution of potentially unsafe C elements with safe C++ counterparts that mirror the behavior, the translation should be much less susceptible to the introduction of new bugs and behaviors in the way a rewrite would be.

With a little cleaning-up of the original code, the code translation ends up being fully automatic and so can be used as a build step to produce (slightly slower) memory-safe executables from the original C source.

[1] https://duneroadrunner.github.io/scpp_articles/PoC_autotrans...

reply
First of all, thank you for presenting a succinct take on this viewpoint from the other side of the fence from where I am at.

So how can I learn from this? (Asking very aggressively, especially for Internet writing, to make the contrast unmistakable. And contrast helps with perceiving differences and mistakes.) (You also don’t owe me any of your time or mental bandwidth, whatsoever.)

So here goes:

Question 1:

How come "speed", "performance", race conditions and st_ino keep getting brought up?

Speed (latency), physically writing things out to storage (sequentially, atomically (ACID), all of HDD NVME SSD ODD FDD tape, "haskell monad", event horizons, finite speed of light and information, whatever) as well as race conditions all seem to boil down to the same thing. For reliable systems like accounting the path seems to be ACID or the highway. And "unreliable" systems forget fast enough that computers don’t seem to really make a difference there.

Question 2:

Does throughput really matter more than latency in everyday application?

Question 3 (explanation first, this time):

The focus on inode numbers is at least understandable with regards to the history of C and unix-like operating systems and GNU coreutils.

What about this basic example? Just make a USB thumb drive "work" for storing files (ignoring nand flash decay and USB). Without getting tripped up in libc IO buffering, fflush, kernel buffering (Hurd if you prefer it over Linux or FreeBSD), more than one application running on a multi-core and/or time-sliced system (to really weed out single-core CPUs running only a single user-land binary with blocking IO).

reply
> Does throughput really matter more than latency in everyday application?

In my experience latency and throughput are intrinsically linked unless you have the buffer-space to handle the throughput you want. Which you can't guarantee on all the systems where GNU Coreutils run.

reply
> Question 2:

> Does throughput really matter more than latency in everyday application?

IME as a user, hell yes

Getting a video I don't mind if it buffers a moment, but once it starts I need all of that data moving to my player as quickly as possible

OTOH if there's no wait, but the data is restricted (the amount coming to my player is less than the player needs to fully render the images), the video is "unwatchable"

reply
I don't mean to nitpick, but absolute values for both of these matter much less than how much it is compared to "enough". As long as the throughput is enough to prevent the video from stuttering, it doesn't matter if the data is moved to your video player program at 1 GB/s or 1 TB/s. Conversely, you say you don't mind if a video buffers for a moment but I'm willing to bet there's some value of "a moment" where it becomes "too long". Nobody is willing to wait an hour buffering before their video starts.

The perception of speed in using a computer is almost entirely latency driven these days. Compare using `rg` or `git` vs loading up your banking website.

reply
Hell no.

Linux desktop (and the kernel) felt awful for such a long time because everyone was optimizing for server and workstation workloads. Its the reason CachyOS (and before that Linux Zen and.. Licorix?) are a thing.

For good UX, you heavily prioritize latency over throughput. No one cares if copying a file stalls for a moment or takes 2 seconds longer if that ensures no hitches in alt tabbing, scrolling or mouse movement.

reply
Sorry, complete noob here. Why didn't you just cd into $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')? Why do you need to use the while loop for cd?

EDIT: got it. -bash: cd: a/a/a/....../a/a/: File name too long

reply
No need to apologize at all. Doing it in one cd invocation would fail since the file name is longer than PATH_MAX. In that case passing it to a system call would fail with errno set to ENAMETOOLONG.

You could probably make the loop more efficient, but it works good enough. Also, some shells don't allow you to enter directories that deep entirely. It doesn't work on mksh, for example.

reply
Facetious reply:

> However, GNU software tends to work very hard to avoid arbitrary limits [1].

reply
Yes? The quote says "tends to", and you still can cd into that directory, albeit not in a single invocation. Windows has similar limitations [0], it's just that their MAX_PATH is just 260 so it's somewhat more noticeable... and IIRC the hard limit of 32 K for paths in non-negotiable.

[0] https://learn.microsoft.com/en-us/windows/win32/fileio/maxim...

reply
I see even the coreutils maintainers find themselves needing -n (no newlines) and -c (count) options to "yes".
reply
Probably a dumb question, but is GNU Core utils interested in / planning on doing its own rust rewrite?
reply
The rewrite in Rust is mostly vanity and marketing but not based on a real technical need...

So I don't see why they would want to do that.

reply
Canonical's usage of uutils is likely for marketing. But the codebase itself was developed for fun, as an excuse for people to have a hands-on way to learn Rust back before Rust was even released, with a minor justification as being cross-platform. From the original README in 2013:

Why?

----

Many GNU, linux and other utils are pretty awesome, and obviously some effort has been spent in the past to port them to windows. However those projects are either old, abandonned, hosted on CVS, written in platform-specific C, etc.

Rust provides a good platform-agnostic way of writing systems utils that are easy to compile anywhere, and this is as good a way as any to try and learn it.

https://github.com/uutils/coreutils/blob/9653ed81a2fbf393f42...

reply
>Canonical's usage of uutils is likely for marketing

Currently their usage is actively worsening the security of their distro

reply
These things were caught and basically all of them weren't covered by any test suite (not even GNU coreutils'). It's a bit bold to claim that it's actively worsening it when it's not an LTS.
reply
[dead]
reply
[dead]
reply
To be fair, Vec::set_len bug in Rust was in 2021. And even then it had to be annotated as `unsafe`. It was then deprecated and a linter check was added: https://github.com/rust-lang/rust-clippy/issues/7681
reply
To be even fair-er, it wasn't actually memory unsafety, it was "just" unsoundness, there was a type, that IF you gave it an io reader implementation that was weird, that implementation could see uninit data, or expose uninit data elsewhere, but the only readers actually used were well behaved readers.
reply
Vec::set_len is by no means deprecated. The lint you linked only covers a very specific unsound pattern using set_len.
reply
Indeed, and it doesn't need to be deprecated, because it's an API explicitly designed to give you low-level control where you need it, and because it is appropriately defined as an `unsafe` function with documented safety invariants that must be manually upheld in order for usage to be memory-safe. The documentation also suggests several other (safe) functions that should be used instead when possible, and provides correct usage examples: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.set... .
reply
> and because it is appropriately defined as an `unsafe` function with documented safety invariants that must be manually upheld in order for usage to be memory-safe.

Didn't we learn from c, and the entire raison detre for rust, is that coders cannot be trusted to follow rules like this?

If coders could "(document) safety invariants that must be manually upheld in order for usage to be memory-safe." there's be no need for Rust.

This is the tautology underlying rust as I see it

reply
No, this is mistaken. Rust provides `unsafe` functions for operations where memory-safety invariants must be manually upheld, and then forces callers to use `unsafe` blocks in order to call those functions, and then provides tooling for auditing unsafe blocks. Want to keep unsafe code out of your codebase? Then add `#![forbid(unsafe_code)]` to your crate root, and all unsafe code becomes a compiler error. Or you could add a check in your CI that prevents anyone from merging code that touches an unsafe block without sign-off from a senior maintainer. And/or you can add unit tests for any code that uses unsafe blocks and then run those tests under Miri, which will loudly complain if you perform any memory-unsafe operations. And you can add the `undocumented_unsafe_comment` lint in Clippy so that you'll never forget to document an unsafe block. Rust's culture is that unsafe blocks should be reserved for leaf nodes in the call graph, wrapped in safe APIs whose usage does not impose manual invariant management to downstream callers. Internally, those APIs represent a relatively miniscule portion of the codebase upon which all your verification can be focused. So you don't need to "trust" that coders will remember not to call unsafe functions needlessly, because the tooling is there to have your back.
reply
The issue with C is that every single use of a pointer needs to come with safety invariants (at its most basic: when you a pass a pointer to my function, do I. take ownership of your pointer or not?). You cannot legitimately expect people to be that alert 100% of the time.

Inversely, you can write whole applications in rust without ever touching `unsafe` directly, so that keyword by itself signals the need for attention (both to the programmer and the reviewer or auditor). An unsafe block without a safety comment next to it is a very easy red flag to catch.

reply