upvote
A stackful coroutine implementation has to save exactly the same registers that a stackless one has to: the live ones at the suspension point.

A pure library implementation that uses on normal function call semantics obviously needs to conservatively save at least all callee-save registers, but that's not the only possible implementation. An implementation with compiler help should be able to do significantly better.

Ideally the compiler would provide a built-in, but even, for example, an implementation using GCC inline ASM with proper clobbers can do significantly better.

reply