Except for everything around that:
* user lifecycle (create/remove/rename accounts)
* authz (who gets sudo, what groups, per-host differences)
* cleanup (what happens when someone leaves)
* visibility (what state is this box actually in right now?)
SSH certs don’t really touch any of that. They answer can this key log in right now, not what should exist on this machine.
So in practice, something else ends up managing users, groups, sudoers, home dirs, etc. Now there are two systems that both have to be correct.
On the availability point: "reasonably available" is doing a lot of work ;)
Even with 1-hour certs:
* new sessions depend on the signer
* fleet-wide issues hit everything at once
* incident response gets awkward if the signer is part of the blast radius
The failure mode shifts from a few boxes don't work to nobody can get in anywhere
The pull model just leans the other way:
* nodes converge to desired state
* access continues even if control plane hiccups
* authn and authz live together on the box
Both models can work - it’s more about which failure mode is tolerable to you.
But for just getting access to role accounts then I find it a lot nicer than distributing public keys around.
And for everything else, a periodic Ansible :-)
Replacing the distribution of a revocation list with short-lived certificates just creates other problems that are not easier to solve. (Also, 1h is bonkers, even letsencrypt doesn't do it)
IMHO, if you're pushing revocation lists at low latency, you could also push authorized keys updates at low latency.