upvote
Reflect in what way? The primary focus of that talk is that it’s possible to infect the binary of a compiler in a way that source analysis won’t reveal and the binary self replicates the vulnerability into other binaries it generates. Thankfully that particular problem was “solved” a while back [1] even if not yet implemented widely.

However, the broader idea of supply chain attacks remains challenging and AI doesn’t really matter in terms of how you should treat it. For example, the xz-utils back door in the build system to attack OpenSSH on many popular distros that patched it to depend on systemd predates AI and that’s just the attack we know about because it was caught. Maybe AI helps with scale of such attacks but I haven’t heard anyone propose any kind of solution that would actually improve reliability and robustness of everything.

[1] Fully Countering Trusting Trust through Diverse Double-Compiling https://arxiv.org/abs/1004.5534

reply
deleted
reply
The proposed solution seems to rely on a trusted compiler that generates the exact same output, bit-for-bit, as the compiler-under-test would generate if it was not compromised. That seems useful only in very narrow cases.
reply
You have a trusted compiler you write in assembly or even machine code. You then compile a source code you trust using that compiler. That is then used for the bit for bit analysis against a different binary of the compiler you produced to catch the hidden vulnerability.
reply
It's assumed that in this scenario you don't have access to a trusted compiler; if you do, then there's no problem.

And the thesis linked above seems to go beyond simply "use a trusted compiler to compile the next compiler". It involves deterministic compilation and comparing outputs, for example.

reply
I believe the issue is if an exploit is somehow injected into AI training data such that the AI unwittingly produces it and the human who requested the code doesn't even know.
reply
That’s a separate issue and specifically not what OP was describing. Also highly unlikely in practice unless you use a random LLM - the major LLM providers already have to deal with such things and they have decent techniques to deal with this problem afaik.
reply
If you think the the major LLM providers have a way to filter malicious (or even bad or wrong) code from training data, I have a bridge to sell you.
reply
No not to filter it out of training but to make it difficult to have poisoned data have an outsized impact vs its relative appearance. So yes, you can poison a random thing only you have ever mentioned. It’s more difficult to poison the LLM to inject subtle vulnerabilities you influence in code user’s ask it to generate. Unless you somehow flood training data with such vulnerabilities but that’s also more detectable.

TLDR: what I said is only foolish if you take the absolute dumbest possible interpretation of it.

reply
Stop scaring me.

You're right though. There's been talks of a big global hack attack for a while now.

Nothing is safe anymore. Keep everything private airgapped is the only way forward. But most of our private and personal data is in the cloud, and we have no control over it or the backups that these companies keep.

While LLMs unlock the opportunity to self-host and self-create your infrastructure, it also unleashes the world of pain that is coming our way.

reply
How loud would it be?
reply
"we" ? "We" know. We just can't do much about people on LLM crack that will go around any and every quality step just to tell themselves the LLM made them x times more productive
reply
The only way to be safe is to constantly change internal APIs so that LLMs are useless at kernel code
reply
To slightly rephrase a citation from Demobbed (2000) [1]:

The kernel is not just open source, it's a very fast-moving codebase. That's how we win all wars against AI-authored exploits. While the LLM trains on our internal APIs, we change the APIs — by hand. When the agent finally submits its pull request, it gets lost in unfamiliar header files and falls into a state of complete non-compilability. That is the point. That is our strategy.

1 - https://en.wikipedia.org/wiki/Demobbed_(2000_film)

reply
The guys deterministically bootstrapping a simple compiler from a few hundred bytes, which then deterministically compiles a more powerful compiler and so on are on to something.

In the end we need fully deterministic, 100% verifiable, chains. From the tiny boostrapped beginning, to the final thing.

There are people working on these things. Both, in a way, "top-down" (bootstrapping a tiny compiler from a few hundred bytes) and "bottom-up" (a distro like Debian having 93% of all its packages being fully reproducible).

While most people are happy saying "there's nothing wrong with piping curl to bash", there are others that do understand what trusting trust is.

As a sidenote although not a kernel backdoor, Jia Tan's XZ backdoor in that rube-goldberg systemd "we modify your SSHD because we're systemd and so now SSHD's attack surface is immensely bigger" was a wake-up call.

And, sadly and scarily, that's only for one we know about.

I think we'll see much more of these cascading supply chains attack. I also think that, in the end, more people are going to realize that there are better ways to both design, build and ship software.

reply
If that would happen, The worry I would have is of all the sensitive Government servers from all over the world which might be then exploited and the amount of damage which can be caused silently by such a threat actor or something like AWS/GCP/these massive hyperscalers which are also used by the governments around the globe at times.

The possibilities within a good threat could be catastrophic if we assume so, and if we assume nation-states to be interested in sponsoring hacking attacks (which many nations already do) to attack enemy nations/gain leverage. We are looking at damage within Trillions at that point.

But I would assume that Linux might be safe for now, it might be the most looked at code and its definitely something safe.

LLVM might be a bit more interesting as it might go a little unnoticed but hopefully people who are working at LLVM are well funded/have enough funding to take a look at everything carefully to not have such a slip up.

reply
You know that people can already write backdoored code, right?
reply
Yeah, and they can write code with vulnerabilities by accident. But this is a new class of problem, where a known trusted contributor can accidentally allow a vulnerability that was added on purpose by the tooling.
reply
But now you have compromise _at scale_. Before poor plebs like us had to artisinally craft every back door. Now we have a technology to automate that mundane exploitation process! Win!
reply
You still have a human who actually ends up reviewing the code, though. Now if the review was AI powered... (glances at openclaw)
reply