Moving beyond fork() + exec()

upvote

Moving beyond fork() + exec()

(lwn.net)

164 points

by jwilk3 hours ago |

upvote

by rom1v2 hours ago|

[-]

Related to the discussion: "A fork() in the road": https://www.microsoft.com/en-us/research/wp-content/uploads/...

> ABSTRACT

> The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.

> As the designers and implementers of operating systems, we should acknowledge that fork’s continued existence as a first-class OS primitive holds back systems research, and deprecate it. As educators, we should teach fork as a historical artifact, and not the first process creation mechanism students encounter.

reply

upvote

by Animats18 minutes ago|

[-]

> The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design.

No, it was done that way so that you could launch a program that was too big to fit in memory with the parent program. The original implementation worked by swapping out the forking program to disk on a fork() call. Then, at the moment the program was swapped out but control had not returned, the process table entry was duplicated and adjusted so that there were now two processes, one in memory and one swapped out. The one in memory then got control, and could do an exec() call.

This allowed large programs to run on small PDP-11 machines. It was needed back in the era of really expensive memory. That's why.

QNX had an interesting approach. Program loading isn't in the OS at all. There's "fork", but program loading is in a library. It links to a .so file which reads the executable header, allocates memory, loads the program, gets it ready to run, and starts it. The program loader runs in user space and is unprivileged. This is probably the right way to do it.

reply

upvote

by lukan9 minutes ago|

[-]

It is almost as if you agree with the authors ..

"In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability"

(But thanks for the good explanation)

reply

upvote

by anarazel1 hours ago|

[-]

It is somewhat interesting that the most widely used "big" OS that doesn't use fork, i.e. Windows, has dog slow process creation...

I agree that there should be non-fork primitives, I'm just not that sure that performance is the best argument.

reply

upvote

by mort961 hours ago|

[-]

The problem with fork isn't really that it's slow. The problem is that if you want it to be not-slow, it locks you into a bunch of OS design decisions: you more or less need a memory subsystem where all writable pages are refcounted and copy-on-write when the refcount is bigger than 1, and you need overcommit.

Now these decisions aren't objectively bad, but they have significant trade-offs and it's probably not a good idea that they're forced simply because we use fork()+exec() for process creation.

reply

upvote

by marcosdumay36 minutes ago|

[-]

CoW is probably a good idea whether you use fork or not. Or rather, fork is probably a better option than just exec exactly because it can benefit from CoW.

At least on systems with virtual addressing. If you want to go into physical addressing, then yes, maybe it's a problem. But Linux will never touch anything with physical addressing, so I don't see what people are complaining about.

reply

upvote

by dapperdrake10 minutes ago|

[-]

How else does consistency work, then?

Only being half facetious here. Maybe you or someone else really has a better take.

reply

upvote

by theK59 minutes ago|

[-]

Didn't he just say that fork turns out to be comparatively faster to the non-fork samples we get? Ie Linux spawns processes faster than Microsoft's kernels?

reply

upvote

by mort9634 minutes ago|

[-]

Didn't I just say that "the problem with fork isn't really that it's slow"? It's all the other OS design choices it forces on you if you want it to be fast.

reply

upvote

by nvme0n1p143 minutes ago|

[-]

We don't have any broadly used non-fork samples. Windows, macOS, and Linux all have fork. So the presence of fork can't be the reason for the performance difference.

(Windows's fork is called ZwCreateProcess)

reply

upvote

by dcrazy28 minutes ago|

[-]

NtCreateProcess does not implement a forking model. It is analogous to posix_spawn.

reply

upvote

by pjmlp1 hours ago|

[-]

Because that OS best practices is to use threads.

Traditionally Windows applications that create processes all the time come from UNIX heritage.

Contrary to UNIX, Windows NT was designed with threads first mentality, from the get go.

While on UNIX they were added after fact, and to this day there are gotchas mixing posix threads with signals, fork and exec.

reply

upvote

by PaulDavisThe1st24 minutes ago|

[-]

A more accurate way to describe this is that Windows' (NT onward) core execution context model is a bunch of threads that by default share memory, whereas Unixen have a core task context model of a bunch of threads that by default do not share memory.

Both systems are implemented using threads as the execution context, but in Unix, the history means that that you fork+exec most of the time, resulting in a two tasks that do not share memory any more. By contrast, on Windows (NT onward) the common case when creating a new execution context is to create a thread that shares memory with others in its process.

Both systems allow the easy use of the other's core abstraction. On Unix, you can either code like its 1986 and use fork without exec, or use clone(3) or any of its higher level abstractions like pthreads.

You're right that POSIX semantics get tangled when using threads.

reply

upvote

by zozbot2341 hours ago|

[-]

Windows was designed with threads-first mentality because on pre-386 machines you don't have viable process memory protection, so your tasks share memory by necessity. This is not a great argument.

reply

upvote

by JdeBP1 hours ago|

[-]

Windows NT was never designed with pre-386 machines in mind. That was the territory of the old DOS+Windows. Windows NT from the get-go was for machines with page-based virtual memory.

* https://computernewb.com/~lily/files/Documents/NTDesignWorkb...

reply

upvote

by pstuart25 minutes ago|

[-]

WinNT 3.5 was a solid offering.

reply

upvote

by epcoa1 hours ago|

[-]

This is not true. NT never had fork, was always based on the assumption of an MMU and Dave Cutler was a well known fork hater in the 80s long before this paper came out and made it cool to be so. By the time Windows 95 was out, the baseline was 386 with an MMU. CreateThread was initially designed for NT in 1993 though (which didn’t support pre-386 CPUs).

reply

upvote

by keitmo8 minutes ago|

[-]

NT performed unnatural acts to implement fork semantics for the POSIX subsystem.

reply

upvote

by JdeBP36 minutes ago|

[-]

As mentioned elsewhere on this page, Windows NT had fork from the start. Vide NtCreateProcess and what happens if an image file is not explicitly supplied.

* https://computernewb.com/~lily/files/Documents/NTDesignWorkb...

reply

upvote

by dcrazy22 minutes ago|

[-]

NtCreateProcess doesn’t accept an image file parameter.

reply

upvote

by pjmlp38 minutes ago|

[-]

Windows NT!

Misread on purpose to make a point?

reply

upvote

by aseipp1 hours ago|

[-]

I suspect it's a long tail sort of thing; it mostly doesn't matter except when it really matters. It's interesting that the stated motivation for the patch is in the context of agentic tools spawning subcommands. There's some related prior art in this area where the payoffs could be much greater, like fuzzing: https://gts3.org/assets/papers/2017/xu:os-fuzz.pdf is an example. It would be very interesting to see this patch applied to e.g. AFL++

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by nvme0n1p11 hours ago|

[-]

That's not the reason for the performance difference. Windows does have a fork primitive (ZwCreateProcess) and it's still slower than Linux's equivalent.

reply

upvote

by dcrazy15 minutes ago|

[-]

Again, NtCreateProcess does not implement fork(). The fundamental characteristic of fork is that the child is an exact replica of the parent, down to the instruction pointer. Windows does not have a way to create a process object with such a configuration.

Also, using the Zw prefix doesn’t make you look more knowledgeable, it makes you look like you’re trying way too hard to borrow credibility.

reply

upvote

by aseipp1 hours ago|

[-]

This paper is great and I also really like one of its references [29] as it goes into some more subtle parts of scalable interfaces, including fork. It's a gem IMO: The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors https://people.csail.mit.edu/nickolai/papers/clements-sc.pdf

reply

upvote

by omoikane1 hours ago|

[-]

Discussion at the time:

https://news.ycombinator.com/item?id=19621799 - A fork() in the road (2019-04-10, 178 comments)

reply

upvote

by jwilk13 minutes ago|

[-]

Discussed also in 2021: https://news.ycombinator.com/item?id=29709802 (16 comments)

reply

upvote

by pizlonator1 hours ago|

[-]

Fork is marvelous for the zygote pattern

Hard to come up with an optimization that is equally efficient and elegant

reply

upvote

by toast01 hours ago|

[-]

The zygote pattern[1] is a great optimization to deal with the cost of forking, but IMHO, being able to inexpensively spawn a carefully tailored process regardless of the size and scope of the current process would be better.

I would guess it would be a small difference in measurable performance between zygote and a direct clean spawn, but it's one less trick an application needs to do, and it would be very helpful for libraries that spawn things. Spawning inside a library isn't always a great thing to do, but some things would really benefit from process level isolation.

[1] In case one isn't aware, the zygote pattern involves forking a 'zygote' process during application startup, and having that process do any forks that need to happen during application runtime. This reduces the cost of forking in large applications, because the zygote will have few fds open and use little memory. This lets your large application spawn new processes without delaying the application or the startup of the new processes. Some applications will spawn many zygotes to allow parallelism for spawning at runtime.

reply

upvote

by pizlonator1 hours ago|

[-]

You're referring to something else, and maybe I'm using the term "zygote" incorrectly.

In all uses of zygotes that I have seen, here's what's really happening:

- `fork` is being used to reduce the cost of starting a process that has a high start-up cost. So, you start one process, run it through the expensive initialization, and then fork it from there to start new processes.

- To make this even faster, you have a pool of pre-forked processes sit around.

- Having pre-forked processes sitting around ready to be used is not expensive because of the CoW property and the fact that a process that forks and then immediately pauses will not have triggered any significant CoW yet.

So, the zygote optimization you speak of is in practice only meaningful on top of systems that are using an optimization uniquely enabled by `fork` (avoiding process initialization costs by cloning a process), and that zygote optimization is further optimized by another property of `fork` (memory sharing of forked processes that haven't done anything else yet).

reply

upvote

by toast01 hours ago|

[-]

Oh I see. I guess your zygotes have developed more than mine. I think Google may have coined or at least popularized the term zygote for this in Chrome and Android, Chrome documentation [1] says:

> A zygote process is one that listens for spawn requests from a main process and forks itself in response. Generally they are used because forking a process after some expensive setup has been performed can save time and share extra memory pages.

I think reading the first sentance and stopping covers my zygote, but adding the second sentance covers yours. So I think we're both right!

I think both paths are useful. If your children need time to startup and become ready, spawn one that does start up work, and then it (pre)forks at the ready state to have processes ready to handle requests (your zygote). This does require a traditional fork() to avoid duplication of work.

But if forking is expensive at runtime because you have a million FDs open and a whole lot of memory allocations, spawn spawners before you start doing work (my zygote). This could be unnecessary with a inexpensive way to spawn a new process from an process that has lots of resources in use.

Of course, you can also use my zygotes to spawn your zygotes. Zygoteception.

[1] https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...

reply

upvote

by skydhash14 minutes ago|

[-]

I quite like the idea. I’m using OpenBSD on an oldish laptop, and fork-exec is expensive enough that it conflicts with the usb subsystem. Isochronous transfers have a 1ms realtime requirement and it seem that the fork-exec system calls hold the giant lock long enough to mess with it (audio stutters).

While I’ve not bothered to profile it, but it seems that process that have lot of mapped pages is the issue (firefox, emacs,…). In the emacs case, the issue is when the main process trying to fork-exec, if I start a shell session (with shell-mode or term-mode), it works fine.

reply

upvote

by PaulDavisThe1st23 minutes ago|

[-]

> being able to inexpensively spawn a carefully tailored process regardless of the size and scope of the current process would be better.

It's called clone(2)

reply

upvote

by vlovich1231 hours ago|

[-]

The paper explicitly covers it that various memory COW/snapshot mechanisms are probably faster and safer than the zygote pattern. As it stands getting the zygote pattern correct and safe is something you have to plan for upfront. You can’t retrofit it which is why the paper mentions it has poor composability. Also the advantages of the zygote pattern can be overstated since the memory sharing benefit is minimal since it has to happen so early and modern OSes already transparently CoW duplicate pages in the background.

reply

upvote

by mrkeen2 hours ago|

[-]

> fork() is a relatively expensive system call; it must copy the entire process state (including memory) for the child process. Many optimizations have been made over the years, but a fork is still a fundamentally costly operation. To make things worse, a fork() call is often immediately followed by an exec(), which will discard all of that memory that was so carefully copied for the child.

It's weird to leave out a mention of copy-on-write - the optimisation that means that you don't copy over all the memory.

reply

upvote

by tux32 hours ago|

[-]

This was left implicit in the article, but what they mean by copying the process state here is the memory management structures. That's mainly the page tables and the VMAs.

That means you have to allocate new pages to hold a copy of all these structures, even if the actual memory pointed by the pages is shared. And walking all those structures to make a copy is still costly.

reply

upvote

by cls591 hours ago|

[-]

Even with copy-on-write, fork() still has to pay the setup cost for COW. If the parent process has a lot of busy threads (e.g. Java), you can end up doing a lot of unnecessary COW before exec() fires.

reply

upvote

by epcoa1 hours ago|

[-]

> It's weird to leave out a mention of copy-on-write

For the intended audience of such a paper this is base knowledge.

reply

upvote

by FooBarWidget2 hours ago|

[-]

It says state. Copy on write still means it's O(number of page table entries) even if you don't copy the contents. It's a well known issue that forking a program with large virtual memory size is slow.

reply

upvote

by m00x4 minutes ago|

[-]

On modern hardware a cow page copy should only take 1-5ms. Redis forks to save the db to disk and it's been a solid design choice.

I guess it depends on how sensitive your application is to main thread pauses.

reply

upvote

by mort961 hours ago|

[-]

It says "(including memory)". It's pretty natural to read this as "(including the contents of allocated pages)".

reply

upvote

by sanderjd2 hours ago|

[-]

I just ran into this recently, where I had an obscure bug caused by needing to close more file descriptors in the forked process. "I want a clone of the current process" is just way less common in my experience than "I want a completely new process". It feels crazy that we don't have a way to directly express the latter thing, and can only approximate it by cloning and then fixing things up in post.

reply

upvote

by 17186274402 hours ago|

[-]

But you generally want to communicate with that process, so you do need to setup e.g. file descriptors and stuff, which needs information from the parent process to be passed.

reply

upvote

by yxhuvud52 minutes ago|

[-]

Yes, you do want to pass in some stuff. But by default you get every single open file descriptor and a copy of every single stack that any threads use for execution.

It shares way too much, and have huge use cases where it is really, really bad.

reply

upvote

by jonhohle2 hours ago|

[-]

Most programming languages abstract this out to be able to connect or drop the 3 standard pipes. Typically this is the only thing that can be shared anyway unless the other program is specifically shared and expects other file handles to be available, in which case fork might be the right system call anyway.

reply

upvote

by stefan_55 minutes ago|

[-]

Keep in mind that this is the only way to start any process. Even if you just want to launch some throwaway utility program.

reply

upvote

by dnw2 hours ago|

[-]

What do you mean by "a completely new process"?

reply

upvote

by wongarsu2 hours ago|

[-]

The equivalent of CreateProcessW https://learn.microsoft.com/en-us/windows/win32/api/processt...

reply

upvote

by sanderjd2 hours ago|

[-]

A process that shares nothing with the process that spawned it.

reply

upvote

by jerf2 hours ago|

[-]

A thing that makes that complicated is that while you want that conceptually, you don't want that in reality. For instance, if the spawning process is in a container of some sort and it spawned a process that "shares nothing with the process that spawned it", the spawned process would no longer be in that container, because the state of "being in the container" is one of the things it shares with the parent process.

This is just an example of I don't even know how many things a modern-day process will share from its parent.

By "complicated" I do not even remotely mean "unsolvable". I just mean that if you really dig down into what it means to "share nothing" in a modern operating system, it's a lot richer than it was back when fork+exec was a practical solution. There's a lot of fuzzy things that could go either way when you say "shares nothing".

reply

upvote

by JoBrad2 hours ago|

[-]

That’s how you get zombie processes and memory leaks.

reply

upvote

by stabbles2 hours ago|

[-]

Isn't that covered by O_CLOEXEC?

reply

upvote

by anarazel1 hours ago|

[-]

There's a bunch of nastiness around that too. If you have e.g. library state that assumes the fd still works you can get her very confusing bugs once another file is opened into that fd number...

reply

upvote

by JdeBP59 minutes ago|

[-]

You may be mixing up fork and exec. Library data state isn't retained over execve(), and O_CLOEXEC does not take effect at fork().

reply

upvote

by anarazel24 minutes ago|

[-]

Indeed. Not enough coffee, apparently.

reply

upvote

by 7jjjjjjj1 hours ago|

[-]

>It feels crazy that we don't have a way to directly express the latter thing

Isn't that what posix_spawn is for?

reply

upvote

by toast050 minutes ago|

[-]

posix_spawn addresses the need from userspace. Under the hood, it's still doing more or less a fork/exec, with the baggage that comes with it. A syscall would be nicer.

reply

upvote

by yxhuvud50 minutes ago|

[-]

And how do you think posix_spawn is implemented?

reply

upvote

by uecker2 hours ago|

[-]

The elegance of the fork() + exec() model is that every kind of configuration can be done after the fork using all the usual APIs. Every attempt to replace it with a combined call that I have seen so far seemed fundamentally poorer because it needs to add all configuration options as parameters to the call and then do this in away that you can extend it later and does not become a mess.

reply

upvote

by amluto2 hours ago|

[-]

I have the entirely opposite opinion. IMO a big mistake of the UNIXy model is that so much state is preserved across the creation of a process. For example, there are APIs to have a specific thing be fd number 4 so you can run a program and have it find that thing at fd 4. This is weird.

Windows, for all its many, many faults, did not use fork+exec and instead mostly has options for how one creates a process. It wasn’t done elegantly, but it was the right decision.

reply

upvote

by uecker34 minutes ago|

[-]

Well, a lot of the power of the UNIX shell comes form this and I see this as a major advantage over Windows. So no, I do not think Windows got it right.

Any kind of replacement should aim for the same conceptual simplicity and power. Sadly, I fear that people driving development nowadays are more interested in building unbreakable walled gardens for advertisement or app stores, or trying to squeeze down the some small gain when used on the cloud. I am more interested in general computing on the user side.

reply

upvote

by __david__1 hours ago|

[-]

Having fd 4 mean something specific is no weirder than having fds 0,1, and 2 mean something specific, which is probably never going to change. At some point you just gotta embrace the Unix.

reply

upvote

by JdeBP53 minutes ago|

[-]

Heh! The Unix didn't embrace the idea of file descriptor 3 meaning something specific. (-:

* https://jdebp.uk/FGA/bernstein-on-ttys/cttys.html

Interestingly, on MS/PC/DR-DOS file descriptor 3 was stdaux. and file descriptor 4 was stdprn.

reply

upvote

by 17186274402 hours ago|

[-]

Is it weirder, that you can pass an variable precisely into argument 4? You do need to pass information to a subprocess and there needs to be some agreement on what means what. Sure, maybe you could use names instead of fds, but that sounds needlessly complicated.

reply

upvote

by amluto1 hours ago|

[-]

A way to pass a defined list of handles to a subprocess (or a friendly other process) makes sense. Having that mechanism be direct inheritance of those handles with the same numbering as the source is obnoxious.

reply

upvote

by jonhohle2 hours ago|

[-]

That’s like saying you could use positions to specify function argument access (as in assembly) instead of variable names. File descriptors being numbers that are likely array indexes in a file handle seems like a leaky abstraction. Having a namespace that a parent process share with its children seems like a much cleaner design.

reply

upvote

by chasil1 hours ago|

[-]

Well, Cygwin and Busybox have shown me that fork-heavy activities are about 100x slower on Windows than Linux.

The Windows approach may be correct, but it suffers in performance from the POSIX perspective.

I have heard that WSL1 iimproves this.

reply

upvote

by amluto54 minutes ago|

[-]

Linux has worked pretty hard to optimize fork(). This doesn’t mean that fork() is a good idea.

Windows does not historically depend on fork(), so there was no native fork(), so Cygwin kludged it up.

reply

upvote

by JdeBP50 minutes ago|

[-]

Actually, there is a native fork. There had to be, as POSIX personality support was a part of the Windows NT 3.1 design. What there wasn't was a Win32 form of fork. The Native API for Windows NT allowed it quite straightforwardly.

reply

upvote

by burnt-resistor2 hours ago|

[-]

You're simply failing to grasp the value of the simplicity, compatibility, and portability of POSIX/*nix. Inventing yet another way to create a process would be complex and break things. It's a-la-carte to enable configuration after fork of the new CoW or non-CoW process but before exec (unless vfork or similar were used). This is the model.

If you want to greenfield re-engineer the world with all new system calls and a totally different execution model, feel free to go right ahead.

reply

upvote

by wvenable1 hours ago|

[-]

"The reasonable man adapts himself to POSIX: the unreasonable one persists in trying to adapt the POSIX to himself. Therefore all progress depends on the unreasonable man."

― George Bernard Shaw, probably.

reply

upvote

by matheusmoreira20 minutes ago|

[-]

The new system calls described in the article have an extensible declarative command interface built into them to do things like close or duplicate file descriptors. Not opposed to it but it definitely stood out to me.

reply

upvote

by __david__1 hours ago|

[-]

I agree. I think the current way is very nice to use (in c). I think the best way would be to have something similar to vfork() but not bound by posix rules. Then make the normal posix apis (close, setuid, etc.) act like the Rust “builder” pattern. Possibly giving them a prefix for explicitness. That way the “fill out a giant structure” people could have their wish and the people that just want a faster posix experience don’t have to learn an entirely new concept and api surface. It would be future extensible that way, too (just add more prefixed calls to the builder).

reply

upvote

by PaulDavisThe1st21 minutes ago|

[-]

Whatever elegance fork(2) has (or doesn't) have, clone(2) has more.

reply

upvote

by fanf22 hours ago|

[-]

Yeah. The right way to eliminate fork() is to make the usual APIs that modify process state take an explicit process handle, so the same APIs can be used to set up an empty process. They can also be composed in other ways, eg for IPC or debugging.

reply

upvote

by garaetjjte1 hours ago|

[-]

That's mostly papering over design mistake that most syscalls doesn't accept target pid. Otherwise you could just create suspended process, configure it with syscalls that explicitly take target pid, and start it.

reply

upvote

by uecker53 minutes ago|

[-]

Maybe, I am not saying fork() + exec() model couldn't be improved, but most people saying it is "terrible" and it needs to die seem to go on to propose something substantially worse.

reply

upvote

by ggm24 minutes ago|

[-]

Aesthetically I have no intention of moving beyond. I'm content with my kernels scheduler and how it maps "heavyweight" processes to cores.

I do use threaded code. It's significantly harder to write and reason about. (45 years in to a CS career, ageing out)

You have to be clever to do better than clever people. Clever people bootstrapped me into fork()/exec() and I know my limits.

reply

upvote

by redleader5514 minutes ago|

[-]

When cores start needing more than 9 bits to be represented and RAM is in terabytes, many of the old assumptions need to change. Schedulers need to be implemented in userspace, RAM needs to be allocated in GB, not in 4k, io needs to require less round-trips between kernel and user space and NICs need to do a lot more work before the data reaches the CPU.

reply

upvote

by skydhash2 minutes ago|

[-]

I’m using Emacs and various cli tools and while threads are nice to have, they can easily ramp up the complexity of a program beyond what is necessary. I much prefer the boilerplate of setting up a thread pool and tasks queue, rather than dealing with all the await/async syntactic sugar.

reply

upvote

by ajkjk1 hours ago|

[-]

Fork always seemed conceptually terrible even when I first learned about it.. If you want to do one thing (start a process) you should not have to use a mysterious incantation that does a different unrelated thing (forks your process) in order to do it.

I am curious about what the best way to handle the example in the article of one process spawning many git subprocesses is. Surely it just doesn't make sense to repeatedly start git from scratch in the course of a long-running parent operation. What's the low cost abstraction for the same result, though?

reply

upvote

by wmf56 minutes ago|

[-]

libgit2 exists. You could imagine communicating with some gitd over a pipe/socket but I don't know why that would be a good idea. Short of that you have to spawn processes.

reply

upvote

by jcalvinowens1 hours ago|

[-]

It is a weirdly common misconception that that fork() is cheap... it is O(N) on the size of the process, and it always has been.

Yes, it's copy on write... but there is a linear relationship between the size of the process and the number of page table entries required to represent it.

reply

upvote

by ComputerGuru2 hours ago|

[-]

I'm not surprised Chen's patch was rejected; that's an extremely niche usecase not worth supporting. With my shell developer hat on, I agree with the closing "developers would likely welcome a native implementation that isn't (unlike the current implementation) hiding fork() and exec() under the covers".

reply

upvote

by smj-edison2 hours ago|

[-]

It sounds like they're interested in the concept though, just not that specific implementation.

reply

upvote

by sanderjd2 hours ago|

[-]

Yeah this seems like a promising discussion.

reply

upvote

by ktpsns2 hours ago|

[-]

There is lots of discussion on this old API here on hacker news, for instance https://news.ycombinator.com/item?id=31739794

reply

upvote

by a-dub17 minutes ago|

[-]

i thought this was all fixed with special modes of clone that are optimized and don't actually copy anything (ie, it creates a new deficient process that can pretty much only exec)?

reply

upvote

by Panzerschrek1 hours ago|

[-]

The whole approach of using fork seems to be unnatural for me. In many cases (even in the majority of them) it's not needed to inherit the whole structure of the parent process, but to start a given executable. Windows does this better with its CreateProcessW interface.

reply

upvote

by debatem12 hours ago|

[-]

There are a lot of slightly different fork-exec-like things in the concept space and it's hard to imagine one approach satisfying them all. IMO it would be interesting to take an approach analogous-ish to sched_ext_ops where you built the rough flow chart of a combined fork-exec, but with hooks built to enable ebpf to change behavior or skip the bits these sophisticated users don't want/need.

reply

upvote

by MBCook1 hours ago|

[-]

Fork/exec is great if you actually want the traditional copy of your process for some reason.

For launching something totally new, like the example in the article of some tool calling git, I think it does make a ton of sense to make something new.

Especially since I suspect that is by far the more common case. I suspect “I want a clone of me“ is relatively rarely used at this point.

reply

upvote

by mike_hock1 hours ago|

[-]

The most astonishing part is that this is dated June 5th, 2026.

I.e. a year that starts with 20, not 19.

reply

upvote

by JdeBP34 minutes ago|

[-]

These discussions were definitely had back in the 20th century too. The spawn model versus the fork+execve model has been an on-going debate since the time of MS/PC/DR-DOS.

reply

upvote

by Sophira2 hours ago|

[-]

I'm guessing that a big part of the problem with moving away from fork() in general is that each new process needs a copy of the parent process' environment anyway, right?

reply

upvote

by zerobees2 hours ago|

[-]

The LWN article is incorrect in saying that it "must copy the entire process state (including memory) for the child process". There are some kernel structures and page tables that need to be initialized, plus you need a new stack, but it's not nearly as dramatic as implied. Most of the parent's memory is "incorporated by reference", so to speak.

In fact, if you profile it, in the fork() + execve() model, execve() is far more expensive, because not only does it replace the old process with a new one, but it also involves running the dynamic linker, which opens, parses, and mmaps library files.

It still makes sense to get rid of the fork() overhead if you're going to throw away the cloned process state soon thereafter, but if you wanted to make process execution radically faster, rethinking the exec architecture would probably offer more significant gains.

reply

upvote

by corbet1 hours ago|

[-]

The kernel does not copy every page, but it does have to copy all of the VMAs. Setting memory to COW (which can involve changing a lot of page-table-entries) is not free either. I guess I could have mentioned copy-on-write explicitly, but I do not believe that what I wrote was incorrect.

reply

upvote

by nasretdinov1 hours ago|

[-]

Fork becomes more and more expensive the higher the RSS of the process, roughly 1ms per 1Gb of the process size with 4kb pages. Given that modern servers can easily support 1-2Tb of RAM the fork() part can easily take several hundred milliseconds, blocking everything in the meantime. So for larger programs you kinda have to have a "fork helper" process if you need to execute external programs for some reason.

reply

upvote

by dijit2 hours ago|

[-]

I'm a bit naive, but I don't think that's necessarily a requirement.

It might be commonly held convention, and thus, an assumption, in Linux (and, broadly, UNIX) but I don't think it's true inside VAX or even Windows, so I don't think it's a requirement.

Unless I've missed something (which is totally possible, this is not an area of OS design I've spent much time).

reply

upvote

by lanstin2 hours ago|

[-]

But also UID, groups, controlling TTY, process group, capabilities, pipes, shared memory, etc. and the file descriptors while maybe not inherently needed are how a lot of Unix plumbing works.

reply

upvote

by sjmulder2 hours ago|

[-]

Even DOS has environment inheritance!

reply

upvote

by sanderjd2 hours ago|

[-]

A lot of times you actively don't want the parent environment or any of the memory or file descriptors. And then you have to actively do work to fix all that stuff up after the fork.

reply

upvote

by lokar2 hours ago|

[-]

the environment is not that big

reply

upvote

by lokar2 hours ago|

[-]

This seems unnecessary to me. In the example, the core of git should be a library yo can link so you don't need to run the binary. That would be better in every way.

reply

upvote

by omoikane1 hours ago|

[-]

Launching git repeatedly was probably not the best example. But it's hard to think of good examples where launching processes repeatedly is the most performant thing to do, probably because launching processes had been expensive and everyone has learned to do something else (libraries, zygotes, etc). Maybe a different question is: if launching processes were cheap, is there something we would implement as processes instead of libraries?

I can recall just one program that's intentionally not implemented as a library, but I think people have since built a library on top of it:

https://dechifro.org/dcraw/#:~:text=Why%20don%27t%20you%20im...

reply

upvote

by 17186274402 hours ago|

[-]

But when you use a process, you get tons of things for free, the subtask is invoked in parallel, you get isolation and you can control execution for free. Unless you are already writing a multithreaded program or already accept passing objects in memory, using a process is actually easier to write than using a library.

If I use a library, I also need to start using threads and need to invent some core synchronization mechanism. I essentially are reinventing a small scheduler, when I already get this from the OS for free. Also know any crash in the third-party code will crash the whole program, the third-party code has access to the whole address space. With invoking a process you also have a standardized API implemented by the OS.

reply

upvote

by sanderjd2 hours ago|

[-]

There are lots of reasons to want to spawn fresh processes, which aren't solved by linking a library.

reply

upvote

by lokar2 hours ago|

[-]

Sure, but not many times a second

reply

upvote

by kllrnohj1 hours ago|

[-]

Every build system ever says hello.

reply

upvote

by aerzen2 hours ago|

[-]

Spawning processes should not be on the hot path of any program.

reply

upvote

by 17186274402 hours ago|

[-]

Why? That's a very useful processing primitive.

reply

upvote

by lokar2 hours ago|

[-]

It’s a hack with many disadvantages. Sometimes a hack is the right answer, but the kernel should it add a primitive for it.

reply

upvote

by MBCook1 hours ago|

[-]

Should bash link in every program the user might want? Load them up as dynamic libraries?

reply

upvote

by pizlonator1 hours ago|

[-]

It ends up on the hot path of programs that use process isolation aggressively

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by hparadiz2 hours ago|

[-]

Maybe tangentially related but I always think it's silly that every linux process has the same libgcc_so.so.1 loaded into memory for each process even though the raw binary for the library is exactly the same so you end up with like 800 copies of libgcc_so.so.1 in memory.

I mean maybe this has been optimized for already and I don't know what I'm talking about but maybe someone with more knowledge about the kernel knows? Is this something we simply can't optimize for because of security implications?

reply

upvote

by 2019842 hours ago|

[-]

Shared libraries (and mmapped files in general) are deduplicated; it's nowhere near as bad as you think. The kernel loads a .so into memory once and then maps that memory into every process that mmaps it.

Editing to add: this deduplication is one of the greatest upsides to dynamic linking. Common libs like libgcc and libc only have to exist in memory once and can stay in CPU caches, whereas if they were statically linked into every binary, each binary would have a copy of that library that wouldn't be shared with anything else and you'd waste a lot of memory.

reply

upvote

by sjmulder2 hours ago|

[-]

Doesn't the loaded code have to be patched for relocations?

reply

upvote

by ptspts2 hours ago|

[-]

It does, so not 100% is reused. The patched parts are in different sections though, so the entire .text (code) section ends up being reused.

reply

upvote

by monocasa2 hours ago|

[-]

Not on modern archs that provide decent support for PIE (position independent executables).

reply

upvote

by 2019841 hours ago|

[-]

How do you think position independent code can call functions from other .so's without being patched with their addresses?

They can't, so even PIC code still has to have a relocation table that gets patched. It's in a different page than the code though, so code does still get reused.

reply

upvote

by monocasa54 minutes ago|

[-]

That's not really patching though, any more than any use of function pointers is patching.

reply

upvote

by t-32 hours ago|

[-]

Not if it's position-independent.

reply

upvote

by saidinesh52 hours ago|

[-]

Typically libgcc_so.so is loaded by the linker, which uses an mmap call to map the binary into the address space.

> The kernel keeps track of which file is mapped where, and can detect when a request is made to map an already mapped file again, avoiding physical memory allocation if possible.

Relevant stack overflow answer: https://stackoverflow.com/questions/61950951/linux-shared-li...

reply

upvote

by mlaretallack2 hours ago|

[-]

In Linux, when a shared lib is loaded by multiple processes, its loaded once and not duplicated in ram. Only if a memory page is modified by the process will the memory be duplicated. (Hope I have explained that correctly)

reply

upvote

by monocasa2 hours ago|

[-]

Those mappings by default all go to the same shared memory.

Unices have been sharing executable memory between processes longer than there's been mmap for user space to do the same thing themselves. I remember seeing it in the 2BSD kernel for instance.

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by BoingBoomTschak2 hours ago|

[-]

Eh? Aren't shared libraries actually shared in memory?

reply

upvote

by 17186274402 hours ago|

[-]

Yeah, that's kind of the point.

reply

upvote

by sirsinsalot2 hours ago|

[-]

I have a rule for myself. If I think something is silly or stupid, I assume I don't understand it. I usually find I do not understand it, and it no longer seems silly when I do understand it.

In this case too, you think it is silly because you don't understand it. Your assumptions are wrong, making it seem silly.

reply

upvote

by burnt-resistor2 hours ago|

[-]

> "If you are repeatedly creating large processes, you are already doing it wrong. The fix is in user space, not the kernel."

Every couple of years, someone claims they have "the solution" implying everyone else who came before them didn't know what they were doing.

reply

upvote

by yxhuvud34 minutes ago|

[-]

It can also mean that neither the hardware side or the software side is static, but change over time. That means that their demands and what they allow also change over time. This leads to the insight that what was perhaps a good idea on 70s hardware/software is not necessarily a good, or even ok, idea 50 years later on modern hardware executing OSes and programs that have been kept up to date.

reply