undefined

points

by pocksuppet1 hours ago |

comments

by Pannoniae7 minutes ago|

[-]

You'd be surprised, again.... most compilers don't generate very good code, mostly because

1. the time for optimisation is limited

2. the constraints are overlapping and just completely intractable beyond a single function (do you want to inline this, saving on the call and increasing binary size, or not do it because it's cold?)

3. they don't have domain-specific knowledge about your code, and even with PGO, they might incorrectly decide what's hot and what's not - typical example are program settings. You didn't enable a setting during PGO instrumentation, compiler sees you didn't call that path, shoves it out of line. Now your PGO-optimised code is worse than -O2. And compilers have different levels of adherence to manual branch hinting - on MSVC you get a reorder at best, Clang and GCC try much harder at [[likely]] and [[unlikely]].

4. There's still quite a bit of low-hanging fruit left, mostly because progress is jagged ;) For example our calling conventions generally suck - this is actually why inlining is so helpful - and the inertia makes everyone emit the default calling convention and that's it.

For example, did you know that compilers have very inconsistent support for struct unpacking? It can be much faster to write

  int32 meow(int64 a, int64 b);

than

  struct mytype {
    int64 x;
    int64 y;
  };

  int32 meow(mytype a);

because the first one goes through registers on the MSVC ABI, the second one gets lowered to the caller passing a pointer to the stack. Before someone says "oh this just means MS sucks" - fair, but for std::unique_ptr the situation is the other way around... on the MSVC ABI the callee cleans it up so it's truly zero-cost, but on the Itanium ABI using it is worse than using T* as a raw pointer... see the GCC codegen :)

These examples might seem a bit cherrypicked but this is only scratching the surface, not to talk about the codegen in higher-level languages, which is even more dreadful. Manually optimising your code can usually get a magnitude worth of free performance, which is just tragic.

I wouldn't even rule out LLM codegen in the future - although they're quite unreliable today so you'd get miscompiles like crazy - but there's just so much low-hanging fruit left on the table that it wouldn't be too out of step...