There's been a least one experiment (posted a few years ago to HN) where someone benchmarked a stackful coroutine implementation with hundreds of thousands (millions?) of stacks that could grow contiguously on-demand up to, e.g., 2MB, but were initially minimally sized and didn't reserve the maximum stack size upfront. The bottleneck was the VMA bookkeeping--the syscalls, exploding the page table, TLB flushing, etc. In principle it could work well and be even more performant than existing solutions, and it might work better today since Linux 6.13's lightweight guard page feature, MADV_GUARD_INSTALL, but we probably still need more architectural support from the system (kernel, if not hardware) to make it performant and competitive with language-level solutions like goroutines, Rust async, etc.