upvote
The repo you linked works by replacing files that are being used by other privileged containers on the same system. That works for the Kubernetes case (I'm a little surprised they don't use static binaries for their own privileged containers, seems a little dangerous to share any kind of data with untrusted tenants even if it's read-only) but not standalone containers.

However, there is a much an easier way of doing a breakout -- you can corrupt the host runc binary in a way analogous to CVE-2019-5736. The next time a container is spawned, the host runc binary will get run as as root and that's that.

Ironically, the first version of the protection against this attack I wrote also protected against page cache poisoning (by making a temporary copy of the runc binary during container setup in a sealed memfd and re-execing that) but the runtime cost of copying a 10MB binary at container startup was seen as too expensive by some users[1] so we ended up with a setup that shares the same page cache. I also distinctly remember arguing at the time that something like Dirty Cow could always happen in the future, and the memfd approach was better for that reason -- maybe I should've stuck to my guns more... :/

In practice the solution for containers is to update your seccomp policy to block the vulnerable syscall.

[1]: https://github.com/opencontainers/runc/issues/1980

reply