upvote
But then you can't log in if your box goes offline for any reason.
reply
Hmm. For user certs you can have the service sign them for, say an hour, so long as you can ssh to your server in that time then there’s no need for any other interaction.

Sure you need your signing service to be reasonably available, but that’s easily accomplished.

Maybe I misunderstand?

reply
That works for authn in the happy path: short-lived cert, grab it, connect, done.

Except for everything around that:

* user lifecycle (create/remove/rename accounts)

* authz (who gets sudo, what groups, per-host differences)

* cleanup (what happens when someone leaves)

* visibility (what state is this box actually in right now?)

SSH certs don’t really touch any of that. They answer can this key log in right now, not what should exist on this machine.

So in practice, something else ends up managing users, groups, sudoers, home dirs, etc. Now there are two systems that both have to be correct.

On the availability point: "reasonably available" is doing a lot of work ;)

Even with 1-hour certs:

* new sessions depend on the signer

* fleet-wide issues hit everything at once

* incident response gets awkward if the signer is part of the blast radius

The failure mode shifts from a few boxes don't work to nobody can get in anywhere

The pull model just leans the other way:

* nodes converge to desired state

* access continues even if control plane hiccups

* authn and authz live together on the box

Both models can work - it’s more about which failure mode is tolerable to you.

reply
Well, yes, pick your poison.

But for just getting access to role accounts then I find it a lot nicer than distributing public keys around.

And for everything else, a periodic Ansible :-)

reply
Public keys (for OpenSSH) can be in DNS (VerifyHostKeyDNS) or in, say, LDAP via KnownHostsCommand and AuthorizedKeysCommand.
reply
That sounds like a lot of extra steps. How do I validate the authenticity of a signing request? Should my signing machine be able to challenge the requester? (This means that the CA key is on a machine with network access!!)

Replacing the distribution of a revocation list with short-lived certificates just creates other problems that are not easier to solve. (Also, 1h is bonkers, even letsencrypt doesn't do it)

reply
1h is bonkers for certs in https, but it's not unreasonable for authorized user certs, if your issuance path is available enough.

IMHO, if you're pushing revocation lists at low latency, you could also push authorized keys updates at low latency.

reply