upvote
This sort of technology is available on GKE

https://docs.cloud.google.com/kubernetes-engine/docs/concept...

reply
Interesting! I didn't see they released this. Do you know what their benchmarks are? I know for cloud run they are pretty slow
reply
gVisor is open-source, and `cuda-checkpoint` is freely available.

gVisor's `runsc checkpoint` subcommand supports a `--save-restore-exec-argv` which lets you specify a program to execute before gVisor starts taking the process snapshot.

You can fill in the blanks from there.

reply
Us and the team from Modal have been upstreaming things to the GVisor repo (https://github.com/google/gvisor/pulls) in order to make it compatible with cuda-checkpoint and other parts of our system. While we are both contributing fixes and performance improvements we are unfortunately leaving some secret sauce on the side but hopefully it should get most folks to a successful implementation as is
reply