upvote
Given the use of the word "container" that seems to be using Linux namespacing rather than KVM. In case of containers, the isolation is provided solely by the Linux kernel, plus of course any additional defenses you add on top of it. While Guix shell having a built-in way to spawn isolated containers is extremely cool (I use NixOS. As far as I know, Nix does not have an equivalent feature) it seems like from a security standpoint, it would just be similar to using bubblewrap or Firejail directly. Though I like this idea. Seems very useful and convenient.

What I think we're really after though is something like gVisor, where the guest program is completely isolated from the host kernel, and the daemons that allow the guest program to reach the outside world are themselves highly locked down by the host kernel using technologies like seccomp-bpf and namespacing, on top of whatever constraints and validation they apply on their own. While nothing is foolproof, this feels like, if done carefully, it would give you a very good layer of isolation that would be extremely challenging to bypass. I reckon that the sandbox would cease to be the most interesting attack target in a system like gVisor, since in any complicated system, there will probably always be some lower-hanging fruit. (And of course, TinyKVM seems to be basically in the same wheelhouse. None of these solutions are designed to run GUI software, though I reckon it probably could be made to work.)

reply
I admit I havent investigated this thoroughly, but I suspect the low hanging fruit in the tinykvm case is having rw access to /dev/kvm

I think it should be possible to pass /dev/kvm as an open fd to daemons like kvm server and mark it as non-inheritable. As long as the vm is in a subprocess it would be okay I guess.

reply