At least on systems with virtual addressing. If you want to go into physical addressing, then yes, maybe it's a problem. But Linux will never touch anything with physical addressing, so I don't see what people are complaining about.
It may not be slow, but for the common case where fork is almost immediately followed by exec in the process where fork returns zero fork increases those refcounts and exec almost immediately decreases them again hand does typically unnecessary checks whether refcounts became zero). A combined fork/exec syscall can avoid that work.
On the other hand, a sufficiently powerful combined fork/exec call has to have a lot of parameters that it has to check (whether to inherit open pipes, open files, setting the working directory, etc), and that slows it down.
That can be avoided by having multiple variants of combined fork/exec calls, but you would need lots of them to cover all combinations of flags.
I expect either approach should be faster then having fork, then exec as separate calls, especially when the process calling fork has many resources allocated.
Did someone suggest that it was?
Only being half facetious here. Maybe you or someone else really has a better take.
(Windows's fork is called ZwCreateProcess)
I don’t know how they implemented it, though. Under the hood, it could do the equivalent of a fork/exec pair.