So that leaves faster boot times.
Faster boot times and then the agent does what? And at how many token/s? And what's the "time to first token" anyway?
How do the time to first token and then the token/s inherent limitations of LLMs not totally dominate the running time?
I just don't get the use case.
regular VMs just use too much memory, a typical ubuntu uses 512 MB as a baseline