upvote
It's worth noting (and the paper does go into this) that this is limited to a very specific subset of UB, which they call "guardable."

They are not removing UB around things like out-of-bounds or use-after-free, which would likely be more expensive.

reply
I don’t understand the down votes. Conducting empirical research on the performance impact of undefined behavior is fantastically needed, as the C++ committee’s obsession with undefined behavior strictness (in contrast with longstanding semantics, e.g., uninitialized memory accesses being just fine) has been justified largely by how they enable optimizing compilers. This research shows that many types of UB have a negligible impact on performance.
reply
Possibly somebody downvoted because "thank you" in all caps is not a substantial contribution to discussion. It feels like the kind of low effort stuff you'd see on reddit.

Also, commenting on downvotes is generally frowned upon.

reply
You're getting downvoted because you're looking for a particular result ("UB optimizations don't help performance") rather than actually evaluating the quality of this analysis (which doesn't really support what you want anyway).
reply
> by using link-time optimizations

These are almost never used by software.

reply
Only places where I've seen LTO not be used are places with bad and unreliable build systems that systematically introduce undefined behaviour by violating the ODR.
reply
The only organization I've worked in that had comprehensive LTO for C++ code was Google. I've worked at other orgs even with 1000s of engineers where LTO, PGO, BOLT, and other things you might consider standard techniques were considered voodoo and too much trouble to bother with, despite the obvious efficiency improvements being left on the table.
reply
I helped with pgo work at Microsoft over 15 years ago, back when it was a Microsoft Research project.

The issue with early pgo implementations was getting a really good profile, as you had to have automation capable of fully exercising code paths that you knew would be hot in actual usage, and you needed good instrumentation to know what code paths those were!

The same problem exists now days, but programs are instrumented to hell and back to collect usage data.

reply
I am willing to assume that organizations dedicated to shipping software to customers like Microsoft or Autodesk or somebody like that are almost certainly all in on optimization techniques. The organizations where I worked are ones that are operating first party or third party software in the cloud where they're responsible for building their own artifacts.
reply
PGO is pretty difficult. In my experience compilers don't seem to know the difference between "this thing never runs" and "we don't have any information about if this thing runs". Similarly it might be useful to know "is this branch predictable" more than just "what % is it taken".

CPUs are so dynamic anyway that there often isn't a way to pass down the information you'd get from the profile. eg I don't think Intel actually recommends any way of hinting branch directions.

reply
It's implied by the target offset. Taken branches jump backwards, unlikely branches jump forward.
reply
Not generally, no. This is true for some chips, especially (very) old or simple cores, but it's not something to lean on for modern high end cores.
reply
Generally yes. This is not for "simple" cores this is the state-of-the-art static branch prediction algorithm as described by Intel in their optimization manual.

"Branches that do not have a history in the BTB ... are predicted using a static prediction algorithm: Predict forward conditional branches to be NOT taken. Predict backward conditional branches to be taken."

It then goes on to recommend exactly what every optimizing compiler and post-link optimizers like BOLT do:

"Arrange code to be consistent with the static branch prediction algorithm: make the fall-through code following a conditional branch be the likely target for a branch with a forward target, and make the fall-through code following a conditional branch be the unlikely target for a branch with a backward target."

This is why a reduction in taken forward branches is one of the key statistics that BOLT reports.

reply
Surely you are not putting code behind an if/else
reply
Google doesn't have full-lto either, since binaries are way too big. Thin-lto is vastly less powerful.
reply
"Vastly" eh? I seem to recall that LLVM ThinLTO has slight regressions compared to GCC LTO on specCPU but on Google's own applications the superior whole-program devirtualization offered only with ThinLTO is a net win.
reply
I'll adjust my phrasing.

As a user, building with thin-lto vs full-lto generally produces pretty similar performance in no small part because a huge amount of effort has gone into making the summaries as effective as possible for key performance needs.

As a compiler developer, especially when developing static analysis warnings rather than optimization passes, the number of cases where I've run into "this would be viable if we had full-lto" has been pretty high.

reply
In practice the default ABI on linux x86-64 is still limiting you to binaries that are 4G or thereabout.

Not exactly a problem for LTO since any reasonable build machine will have 128GB of ram.

reply
Yeah, I would have liked to see the paper specify whether the LTO they tried is fat LTO or ThinLTO.
reply
Facebook uses LTO/PGO for C++ pretty broadly.
reply
Yeah they just never hired me. They also invented BOLT.

I think there is a valley in terms of organization size where you have tons of engineers but not enough to accomplish peak optimization of C++ projects. These are the orgs that are spending millions to operate, for example, the VERY not-optimized packages of postgresql from Ubuntu, in AWS.

reply
Well, Ubuntu isn't really a good project to look up upon :)

Hell, their latest upgrade broke one of their flavours. Not to mention how fragile their installer is.

reply
Violating ODR doesn't introduce UB it's IFNDR, Ill-formed No Diagnostic Required which is much worse in principle and in such cases probably also in practice.

UB is a runtime phenemenon, it happens, or it doesn't, and we may be able to ensure the case where it happens doesn't occur with ordinary human controls.

But IFNDR is a property of the compiled program, if you have IFNDR (by some estimates that's most C++ programs) your program has no defined behaviour and never did, so there is no possible countermeasure, too bad game over.

reply
I am curious where you have seen LTO used. Linux distributions and open source projects in general rarely use LTO. Their build systems are usually very good.
reply
LTO is heavily used in my experience. If it breaks something that is indicative of other issues that need to be addressed.
reply
Main issue isn't that it break stuff but that it tend to be pretty slow to compile with it.
reply
.. that's why you compile without LTO during development and do a final 'compile with LTO > profile > fix / optimize > compile with LTO' pass.

Compilation happens once and then runs on hundreds of thousands up to billions of devices. Respect your users.

reply
This assumes that LTO is strictly better than no-LTO, ie only gets faster, has the same optimization hotspots, and doesn't break anything.

I would recommend only doing things that fit within the 'build > text > fix' loop.

reply
Which doesn't matter at all in a release build. And in a dev build it's rarely necessary.
reply
At FAANG scale the cost is prohibitive. Hence the investment in ThinLTO.
reply
At FAANG scale, you absolutely want to have a pass before deployment that does this or you're leaving money on the table.
reply
It's not as obvious a win as you may think. Keep in mind that for every binary that gets deployed and executed, it will be compiled many more times before and after for testing. For some binaries, this number could easily reach the hundreds of thousands of times. Why? In a monorepo, a lot of changes come in every day, and testing those changes involves traversing a reachability graph of potentially affected code and running their tests.
reply
How many Linux distributions use LTO? It is a rarity among Gentoo users as far as I know and that is the one place where you would expect more LTO usage.
reply
It's on by default for Rust release builds, so at least the codepaths in LLVM for it are well-exercised.
reply
I don't think that's right unless the docs are stale:

    [profile.release]
    lto = false
https://doc.rust-lang.org/cargo/reference/profiles.html#rele...
reply
So the thing is that false means thinlto is used depending on other settings, see https://doc.rust-lang.org/cargo/reference/profiles.html#lto

> false: Performs “thin local LTO” which performs “thin” LTO on the local crate only across its codegen units.

I think this is kind of confusing but whatever. I should have been more clear.

reply
There is no cross-crate LTO with 'lto = false', but there is cross-crate thin LTO with 'lto = "thin"'. The codepaths might still be getting hit, but individual CGUs within a crate are generally invisible to the user, which can create the impression that LTO doesn't occur. (That is, if you operate under the mental model of the crate being the basic compilation unit, then 'lto = false' means you'll never see LTO.)
reply
Oh I hadn’t realized Rust does that. Really cool.
reply
That must have been changed sometime in the last year then. When I enable LTO for one of my projects on a Rust compiler from 2024 the compilation time more than doubles.
reply
I should have been more clear: thin LTO is, not full “fat” LTO, for exactly that reason.
reply