[0] https://web.archive.org/web/20260105235513/https://www.chiar...
Most people would prefer opinionated libraries that allow them to not think about the design tradeoffs. The core implementation is targeted at efficient creation of opinionated abstractions rather than providing one. This is the right choice. Every opinionated abstraction is going to be poor for some applications.
Also, as noted in that Simon Tatham article, Python makes choices at the language level that you have to fuss over yourself in C++. Given how different Trio is from asyncio (the async library in Python's standard library), it seems to me that making some of those basic choices wasn't actually that restrictive, so I'd guess that a lot of C++'s async complexity isn't that necessary for the problem.
[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...
Thanks for the list.
1. C++20 coros are stackless, in the general case every async "function call" heap allocates.
2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.
3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all
I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.
(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)
These days on Linux/BSD/Solaris/macOS you can use makecontext()/swapcontext() from ucontext.h and it will turn out roughly the same performance on important architectures as what everyone used to do with custom assembly. And you already have fiber functions as part of the Windows API to trampoline.
I had to support a number of architectures in libdex for Debian. This is GNOME code of course, which isn't everyone's cup of C. (It also supports BSDs/Linux/macOS/Solaris/Windows).
Also, although not likely to be removed anytime soon from existing systems, POSIX has declared the context API obsolescent a while ago (it might actually no longer be part of the standard).
They are actively working on it for their VS2026 C++ compiler. I think since 2017 or so they've kept up with C++ standards reasonably? I'm not a heavy C++ guy, so maybe I'm wrong, but my understanding is they match the standards.
This support table is complete mess. And saying "most platforms are supported" is too optimistic or even cocky.
Passing uncaught exceptions from the coroutine up to the caller is also pretty easy, because it's all synchronous. You just need to wrap it so it can safely travel across the gap. You can restrict the exception types however you want. I chose to support only subclasses of std::exception and handle anything else as an unknown exception.
You could use the same trick used by glibc to implement unstoppable exceptions for POSIX cancellation: the exception rethrows itself from its destructor.
This is also how dotnet handles it, and you can choose whether to rethrow at the caller site, inspect the exception manually, or run a continuation on exception.
That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.
Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.
(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)
This would be about 32 years ago - I don't like thinking about that ...
> require the STL
That it has to heap-allocate if non-inlined is a misconception. This is only the default behavior.
One can define:
void *operator new(size_t sz, Foo &foo)
in the coro's promise type, and this:
- removes the implicitly-defined operator new
- forces the coro's signature to be CoroType f(Foo &foo), and forwards arguments to the "operator new" one defined
Therefore, it's pretty trivial to support coroutines even when heap cannot be used, especially in the non-recursive case.
Yes, green threads ("stackful coroutines") are more straightforward to use, however:
- they can't be arbitrarily destroyed when suspended (this would require stack unwinding support and/or active support from the green thread runtime)
- they are very ABI dependent. Among the "few registers" one has to save FPU registers. Which, in the case of older Arm architectures, and codegen options similar to -mgeneral-regs-only (for code that runs "below" userspace). Said FPU registers also take a lot of space in the stack frame, too
Really, stackless coros are just FSM generators (which is obvious if one looks at disasm)
A pure library implementation that uses on normal function call semantics obviously needs to conservatively save at least all callee-save registers, but that's not the only possible implementation. An implementation with compiler help should be able to do significantly better.
Ideally the compiler would provide a built-in, but even, for example, an implementation using GCC inline ASM with proper clobbers can do significantly better.
That was over 20 years ago. No idea what the current hotness is.
The stack save/restore happens in: https://swtch.com/libtask/asm.S
Why? You can just as well execute all your coroutines on a single thread. Many networking applications are doing fine with just use a single ASIO thread.
Another example: you could write game behavior in C++ coroutines and schedule them on the thread that handles the game logic. If you want to wait for N seconds inside the coroutine, just yield it as a number. When the scheduler resumes a coroutine, it receives the delta time and then reschedules the coroutine accordingly. This is also a common technique in music programming languages to implement musical sequencing (e.g. SuperCollider)
In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.
You can call a function that makes use of coroutines without worrying about it. That's the core intent of the design.
That is, if you currently use some blocking socket library, we could replace the implementation of that with coroutine based sockets, and everything should still work without other code changes.
Also, this is not some random GitHub Repo, Chris Kohlhoff is the developer of ASIO :)
Multithreaded? Nope. You can do C++ coroutines just fine in a single-threaded context.
Event loop? Only if you're wanting to do IO in your coroutines and not block other coroutines while waiting for that IO to finish.
> most people end up using coroutines with something like boost::asio
Sure. But you don't have to. Asio is available without the kitchen sink: https://think-async.com/Asio/
Coroutines are actually really approachable. You don't need boost::asio, but it certainly makes it a lot easier.
I recommend watching Daniela Engert's 2022 presentation, Contemporary C++ in Action: https://www.youtube.com/watch?v=yUIFdL3D0Vk
The most helpful resource about it is a guy on stackoverflow (sehe). No idea how to get help once SO will have closed
That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.
(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)
This is why coroutine-based frameworks (e.g., C++20 coroutines with cppcoro) have largely superseded future-chaining for async state machine work — the generated code is often equivalent, but the source code is dramatically cleaner and closer to the synchronous equivalent.
(me: ex-Visual Studio dev who worked extensively on our C++ coroutine implementation)
With the coroutine approach using yield, doesn't that mean the caller needs to decide when to call it again? With the std::future approach where it's event driven by the promise being set when that state/step has completed.
> "The only 'assembly' required is creating the associated promise"
Again, that is only true for one step. For a state machine with N states you need explicit state enums or a long chain of .then() continuations. You also need to the manage the shared state across continuations (normally on the heap). You need to manage manual error propagation across each boundary and handle the cancellation tokens.
You only get a "A nice readable linear flow" using std:future when 1) using a blocking .get() on a thread, or 2) .then() chaining, which isn't "nice" by any means.
Lastly, you seem to be conflating a co_yield (generator, pull-based) with co_await (event-driven, push-based). With co_await, the coroutine is resumed by whoever completes the awaitable.
But what do I know... I only worked on implementing coroutines in cl.exe for 4 years. ;-)
What I was thinking of as a state machine with using std::future was a single function state machine, using switch (state) to the state specific dispatch of asynch ops using std::future, wait for completion then select next state.
Do you believe that std::future is the better option?
It can easily and often does lead to messy rube goldberg machines.
There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.
This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.
I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.
Step 1, solve "time" for general computing.
The difficulty here is that our periods are local out of both necessity and desire; we don't fail to model time as a first class concept, we bring time-as-first-class with us and then attempt to merge our perspectives with varying degrees of success.
We're trying to rectify the observations of Zeno, a professional turtle hunter, and a track coach with a stopwatch when each one has their own functional definition of time driven by intent.
Sounds interesting. If it's not too much of an effort, could you dig up a reference?
Mind you my memory may have distorted it a little beyond what it was, but it's loosely on the topic!
I would just go straight to tbb and concurrent_unordered_map!
The challenge of parallelism does not come from how to make things parallel, but how you share memory:
How you avoid cache misses, make sure threads don't trample each other and design the higher level abstraction so that all layers can benefit from the performance without suffering turnaround problems.
My challenge right now is how do I make the JVM fast on native memory:
1) Rewrite my own JVM. 2) Use the buffer and offset structure Oracle still has but has deprecated and is encouraging people to not use.
We need Java/C# (already has it but is terrible to write native/VM code for?) with bottlenecks at native performance and one way or the other somebody is going to have to write it?
What do you mean here? Do you mean hand-writing MSIL or native interop (pinvoke) or something else?
Your stack is on the heap and it contains an instruction pointer to jump to for resume.
This is quite understandable when you know the history behind how C++ coroutines came to be.
They were initially proposed by Microsoft, based on a C++/CX extension, that was inspired by .NET async/await implementation, as the WinRT runtime was designed to only support asynchronous code.
Thus if one knows how the .NET compiler and runtime magic works, including custom awaitable types, there will be some common bridges to how C++ co-routines ended up looking like.
Appreciate this humor -- absurd, tasteful.
I never understood the value. Just use lambdas/callbacks.
"Just" is doing a lot of work there. I've use callback-based async frameworks in C++ in the past, and it turns into pure hell very fast. Async programming is, basically, state machines all the way down, and doing it explicitly is not nice. And trying to debug the damn thing is a miserable experience
The author just chose to write it as a state machine, but you don't have to. Write it in whatever style helps you reach correctness.
Lol, no thanks. People are using coroutines exactly to avoid callback hell. I have rewritten my own C++ ASIO networking code from callback to coroutines (asio::awaitable) and the difference is night and day!
waitFrames(5); // wait 5 frames
fireProjectile();
waitFrames(15);
turnLeft(-30/*deg*/, 120); // turn left over 120 frames
waitFrames(10);
fireProjectile();
// spin and shoot
for (i of range(0, 360, 60)) {
turnRight(60, 90); // turn 60 degrees over 90 frames
fireProjectile();
}
10 lines and I get behavior over time. What would your non-coroutine solution look like?``` int f() { a; co_yield r; b; co_return r2; } ```
this transforms into
``` auto f(auto then) { a; return then(r, [&]() { b; return then(r2); }); }; ```
You can easily extend this to arbitrarily complex statements. The main thing is that obviously, you have to worry about the capture lifetime yourself (coroutines allocate a frame separate from the stack), and the syntax causes nesting for every statement (but you can avoid that using operator overloading, like C++26/29 does for executors)
For simple callback hell, not so much.
Just put your state in visible instance variables of your objects, and then you will actually be able to see and even edit what state your program is in. Stop doing things that make debugging difficult and frustratingly opaque.
https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...
Capcom has their own fork of .NET for the Playstation, for example.
I don't know what kind of GC they implemented.
They will not be using .NET AOT probably ever though. Unity's AOT basically supports full C# (reflection etc) while .NET opted to restrict it and lean more on generated code.
Edit: Nevermind, they eventually bothered.
[1] https://docs.unity3d.com/6000.3/Documentation/ScriptReferenc...
Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?
Really you're generating the vague concept of a yield instruction but you can return other coroutines that are implicitly run and nest your execution... Because of this you can't wait less than a frame so things are often needlessly complicated and slow.
It's like using a key to jam a door shut. Sure a key is for keeping doors closed but...
I've been a serious Unity developer for 16 years, and I avoid coroutines like the plague, just like other architectural mistakes like stringly typed SendMessage, or UnityScript.
Unity coroutines are a huge pain in the ass, and a lazy undisciplined way to do things that are easy to do without them, using conventional portable programming techniques that make it possible to prevent edge conditions where things fall through the cracks and get forgotten, where references outlive the objects they depend on ("fire-and-forget" gatling foot-guns).
Coroutines are great -- right up until they aren’t.
They give you "nice linear code" by quietly turning control flow into a distributed state machine you no longer control. Then the object gets destroyed, the coroutine keeps running, and now you’re debugging a null ref 200 frames later in a different scene with an obfuscated call stack and no ownership.
"Just stop your coroutines" sounds good until you realize there’s no coherent ownership model. Who owns it? The MonoBehaviour? The caller? The scene? Every object it has a reference to? The thing it captured three yields ago? The cure is so much worse than the disease.
Meanwhile: No static guarantees about lifetime. No structured cancellation. Hidden allocation/GC from yield instructions. Execution split across frames with implicit state you can’t inspect.
Unity has a wonderful editor that lets you inspect and edit the state of the entire world: EXCEPT FOR COROUTINES! If you put your state into an object instead of local variables in a coroutine, you can actually see the state in the editor.
All of this to avoid writing a small explicit state machine or update loop -- Unity ALREADY has Update and FixedUpdate just for that: use those.
Coroutines aren’t "cleaner" -- they just defer the mess until it’s harder to reason about.
If you can't handle state machines, then you're even less equipped to handle coroutines.
It'd be like complaining about arrays being bad because if you pass a pointer to another object, nuke the original array, then try to access the data, it'll cause an error. That's kind of... your own fault? Got to manage your data better.
Unity's own developers use them for engine code. To claim it's just something for noobs is a bit of an interesting take, since, well, the engine developers are clearly using them and I doubt they're Unity noobs. They made the engine.
I'm not advocating for the ubiquitous use of coroutines (there's a time and place), but they're like anything else: if you don't know what you're doing, you'll misuse them and cause problems. If you RTFM and understand how they work, you won't have any issues.
If you strictly require people to know exactly what they're doing and always RTFM and perfectly understand how everything works, then they already know well enough to avoid coroutines and SendMessage and UnityEvents and other footguns in the first place.
It's much easier and more efficient to avoid all of the footguns when you simply don't use any of the footguns.
The monobehavior that invoked the routine owns it and is capable of cancelling it at typical lifecycle boundaries.
This is not a hill I would die on. There's a lot of other battles to fight when shipping a game.
The biggest reason for using Unity is its editor. Don't do things that make the editor useless, and are invisible to it.
The problem with coroutines is that they generate invisible errors you end up shipping and fighting long after you shipped your game, because they're so hard to track down and reproduce and diagnose.
Sure you can push out fixes and updates on Steam, but how about shipping games that don't crash mysteriously and unpredictably in the first place?
Unity's own documentation for changing scenes uses coroutines
https://news.ycombinator.com/item?id=47110605
>1h 48m 06s, with arms spread out like Jesus H Christ on a crucifix: "Because we can dynamically put on ANY surface of the cube ANY image we like. So THAT's how we're going to surprise the world, is by giving clues about what's in the middle later on."
https://youtu.be/24AY4fJ66xA?t=6486
Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Click. Moo!
https://www.gamedeveloper.com/design/the-i-curiosity-i-exper...
>"I'm jealous that [Molyneux] made a more boring clicking game than I did." -Ian Bogost
>"I also think Curiosity was brilliant and inspired. But that doesn't make it any less selfish or brazen. Curiosity was not an experiment. 'Experiment' is a rhetorical ruse meant to distract you from the fact that it's promotional." -Ian Bogost