I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.
(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)
They are actively working on it for their VS2026 C++ compiler. I think since 2017 or so they've kept up with C++ standards reasonably? I'm not a heavy C++ guy, so maybe I'm wrong, but my understanding is they match the standards.
These days on Linux/BSD/Solaris/macOS you can use makecontext()/swapcontext() from ucontext.h and it will turn out roughly the same performance on important architectures as what everyone used to do with custom assembly. And you already have fiber functions as part of the Windows API to trampoline.
I had to support a number of architectures in libdex for Debian. This is GNOME code of course, which isn't everyone's cup of C. (It also supports BSDs/Linux/macOS/Solaris/Windows).
Also, although not likely to be removed anytime soon from existing systems, POSIX has declared the context API obsolescent a while ago (it might actually no longer be part of the standard).
It can be safely ignored for the vast majority of apps. If you're using multithreading (quite likely if you're doing coroutines), then signals are not a good fit anyway.
This support table is complete mess. And saying "most platforms are supported" is too optimistic or even cocky.
That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.
Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.
(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)
This would be about 32 years ago - I don't like thinking about that ...
Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap. You can restrict the exception types however you want. I chose to support only subclasses of std::exception and handle anything else as an unknown exception.
This is also how dotnet handles it, and you can choose whether to rethrow at the caller site, inspect the exception manually, or run a continuation on exception.
You could use the same trick used by glibc to implement unstoppable exceptions for POSIX cancellation: the exception rethrows itself from its destructor.
> require the STL
That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.
One can define:
void *operator new(size_t sz, Foo &foo)
in the coro's promise type, and this:
- removes the implicitly-defined operator new
- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined
Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.
Yes, green threads ("stackful coroutines") are more straightforward to use, however:
- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)
- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too
Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)
A pure library implementation that uses on normal function call semantics obviously needs to conservatively save at least all callee-save registers, but that's not the only possible implementation. An implementation with compiler help should be able to do significantly better.
Ideally the compiler would provide a built-in, but even, for example, an implementation using GCC inline ASM with proper clobbers can do significantly better.
That was over 20 years ago. No idea what the current hotness is.
The stack save/restore happens in: https://swtch.com/libtask/asm.S