The performance of gVisor is often a big limiting factor in deployment.
I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.
I’m curious what gVisor is getting you in your setup — of course gVisor is good for running untrusted code, but would you say that gVisor prevents issues that would otherwise make the agent break out of the kubernetes pod? Like, do you have examples you’ve observed where gVisor has saved the day?
The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)
For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.
The middle ground we've built is that a real Linux kernel interfaces with your application in the VM (we call it a zone), but that kernel then can make specialized and specific interface calls to the host system.
For example with NVIDIA on gVisor, the ioctl()'s are passed through directly, with NVIDIA driver vulnerabilities that can cause memory corruption, it leads directly into corruption in the host kernel. With our platform at Edera (https://edera.dev), the NVIDIA driver runs in the VM itself, so a memory corruption bug doesn't percolate to other systems.
Wait until they find a hole. Then good luck.