undefined

points

[-]

The GCC documentation example for computed goto uses an indexing table, but you can of course use computed goto addresses directly. The problem is smuggling the label addresses outside the function so you don't need the enum/index indirection during dispatch.

I once experimented with some techniques and found you can use constructors, either on function-scoped static variables in C++ or using the constructor function attribute on a nested function in GCC C. The addresses are visible to the constructor function and you can smuggle them by copying through a file- or global- scoped array, then at runtime initialize your data structures and bytecode arrays with the computed goto addresses. That preserves the CPUs branch prediction and prefetching resources for other work.

You can also lazily initialize things, which works well if you're not precompiling bytecode but implementing something like a coroutine or state machine where you just need to initialize the first dispatch address.

Even on modern chips avoiding computed goto indexing can be meaningful (5+%) depending on use case, but the marginal gains from the smuggling shenanigans specifically probably aren't worth the code complexity and portability headaches (clang doesn't support nested functions in C), though I never tried that specific trick with a full-blown bytecode VM.

These days with guaranteed tailcall extensions you can just dispatch on function addresses (knowable before dispatch), though you can't directly control inlining optimizations so if you're mostly doing simple work per operation computed gotos might still have better performance. And for small or straight-forward state machines code can be easier to grok when not split across many functions, especially if they're tailcalling each other through pointers.