undefined

upvote

points

by nananana910 hours ago |

upvote

by pjc5010 hours ago|

[-]

> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly

I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.

(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)

reply

upvote

by giancarlostoro5 hours ago|

[-]

> Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon,

They are actively working on it for their VS2026 C++ compiler. I think since 2017 or so they've kept up with C++ standards reasonably? I'm not a heavy C++ guy, so maybe I'm wrong, but my understanding is they match the standards.

reply

upvote

by audidude5 hours ago|

[-]

> I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C.

These days on Linux/BSD/Solaris/macOS you can use makecontext()/swapcontext() from ucontext.h and it will turn out roughly the same performance on important architectures as what everyone used to do with custom assembly. And you already have fiber functions as part of the Windows API to trampoline.

I had to support a number of architectures in libdex for Debian. This is GNOME code of course, which isn't everyone's cup of C. (It also supports BSDs/Linux/macOS/Solaris/Windows).

* https://packages.debian.org/sid/libdex-1-1

* https://gitlab.gnome.org/GNOME/libdex

reply

upvote

by gpderetta3 hours ago|

[-]

Unfortunately swap context requires saving and restoring the signal mask, which, at least on Linux, requires a syscall so it is going to be at least a hundred times slower than an hand rolled implementation.

Also, although not likely to be removed anytime soon from existing systems, POSIX has declared the context API obsolescent a while ago (it might actually no longer be part of the standard).

reply

upvote

by cyberax27 minutes ago|

[-]

Signal mask? What century are we in?

It can be safely ignored for the vast majority of apps. If you're using multithreading (quite likely if you're doing coroutines), then signals are not a good fit anyway.

reply

upvote

by gpderetta7 minutes ago|

[-]

Aside from the fact that the signal mask is still relevant in 2026 and even for multithreaded programs, that doesn't have anything to do with the fact that POSIX requires swapcontext to preserve it.

reply

upvote

by manwe1509 hours ago|

[-]

Boost has stackful coroutines. They also used to be in posix (makecontext).

reply

upvote

by blacklion6 hours ago|

[-]

There is no "Linux/ARM[64]". But there are "Raspberry Pi" and "RISC-V". I don't know such OSes, to be honest :-)

This support table is complete mess. And saying "most platforms are supported" is too optimistic or even cocky.

reply

upvote

by ndiddy5 hours ago|

[-]

Looking at the repo, it falls back to Windows fibers on Windows/ARM. If you'd like a coroutine with more backends, I'm a fan of libco: https://github.com/higan-emu/libco/ which has assembly backends for x86, amd64, ppc, ppc-64, arm, and arm64 (and falls back to setjmp on POSIX platforms and fibers on Windows). Obviously the real solution would be for the C or C++ committees to add stackful coroutines to the standard, but unless that happens I would rather give up support for hppa or alpha or 8-bit AVR or whatever than not be able to use stackful corountines.

reply

upvote

by gpderetta3 hours ago|

[-]

A proposal to add stackfull coroutines has been around forever and gets updated at every single mailing. Unfortunately the authors don't really have backing from any major company.

reply

upvote

by fluoridation8 hours ago|

[-]

I think what they meant is that that what it takes to add coroutines support to a C/++ program. Adding it to, say, Java or C# is much more involved.

reply

upvote

by Joker_vD10 hours ago|

[-]

Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.

That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.

reply

upvote

by lelanthran10 hours ago|

[-]

> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.

Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.

(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)

reply

upvote

by Cloudef9 hours ago|

[-]

Wouldnt that be stackless (shared stack)

reply

upvote

by lelanthran8 hours ago|

[-]

Correct; stackless. I misspoke.

reply

upvote

by zabzonk9 hours ago|

[-]

You can do a lot of horrible things with setjmp and friends. I actually implemented some exception throw/catch macros using them (which did work) for a compiler that didn't support real C++ exceptions. Thank god we never used them in production code.

This would be about 32 years ago - I don't like thinking about that ...

reply

upvote

by gpderetta3 hours ago|

[-]

GCC still uses sj/lj by default on some targets to implement exceptions.

reply

upvote

by gpderetta10 hours ago|

[-]

setjmp + longjump + sigaltstack is indeed the old trick.

reply

upvote

by Sharlin8 hours ago|

[-]

C++ destructors and exception safety will likely wreak havoc with any "simple" assembly/longjmp-based solution, unless severely constraining what types you can use within the coroutines.

reply

upvote

by fluoridation8 hours ago|

[-]

Not really. I've done it years ago. The one restriction for code inside the coroutine is that it mustn't catch (...). You solve destruction by distinguishing whether a couroutine is paused in the middle of execution or if it finished running. When the coroutine is about to be destructed you run it one last time and throw a special exception, triggering destruction of all RAII objects, which you catch at the coroutine entry point.

Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap. You can restrict the exception types however you want. I chose to support only subclasses of std::exception and handle anything else as an unknown exception.

reply

upvote

by pjc508 hours ago|

[-]

> Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap

This is also how dotnet handles it, and you can choose whether to rethrow at the caller site, inspect the exception manually, or run a continuation on exception.

reply

upvote

by gpderetta3 hours ago|

[-]

> mustn't catch (...)

You could use the same trick used by glibc to implement unstoppable exceptions for POSIX cancellation: the exception rethrows itself from its destructor.

reply

upvote

by Sharlin8 hours ago|

[-]

Thanks, that's interesting.

reply

upvote

by TuxSH4 hours ago|

[-]

> every async "function call" heap allocates.

> require the STL

That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.

One can define:

void *operator new(size_t sz, Foo &foo)

in the coro's promise type, and this:

- removes the implicitly-defined operator new

- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined

Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.

Yes, green threads ("stackful coroutines") are more straightforward to use, however:

- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)

- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too

Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)

reply

upvote

by gpderetta3 hours ago|

[-]

A stackful coroutine implementation has to save exactly the same registers that a stackless one has to: the live ones at the suspension point.

A pure library implementation that uses on normal function call semantics obviously needs to conservatively save at least all callee-save registers, but that's not the only possible implementation. An implementation with compiler help should be able to do significantly better.

Ideally the compiler would provide a built-in, but even, for example, an implementation using GCC inline ASM with proper clobbers can do significantly better.

reply

upvote

by socalgal24 hours ago|

[-]

As an x-gamedev, suspect/resume/stackful coroutines made them too heavy to have several thousand of them running during a game loop for our game. At the time we used GameMonkey Script: https://github.com/publicrepo/gmscript

That was over 20 years ago. No idea what the current hotness is.

reply

upvote

by fluoridation1 hours ago|

[-]

Several thousand? What were you using them for? Coroutines' main utility is that they let you write complex code that pauses and still looks sensible, so for games, you'd typically put stuff like the behavior of an NPC in a coroutine. If you have thousands of things to put each in its own coroutine, they must have been really, really simple stuff. At that point, the cost of context switching can become significant.

reply

upvote

by MisterTea6 hours ago|

[-]

A much nicer code base to study is: https://swtch.com/libtask/

The stack save/restore happens in: https://swtch.com/libtask/asm.S

reply