I'm guilty of it too. My blog post from 15 years ago is nowhere near as good as OP's post, but if I though me of 15 years ago lived up to my standards of today, I'd be really disappointed: https://blog.habets.se/2011/07/OpenSSH-certificates.html
Things like deploying dev keys to various production environments, instead of generating/registering them within said environment.
One of the worst recent security examples... You can't get this data over HTTPS from $OtherAgency, it's "not secure" ... then their suggestion is a "secure" read-only account to the other agency's SQL server (which uses the same TLS 1.3 as HTTPS). This is from person in charge of digital security for a government org.
You can also make all the certs short-lived (and only store them in ram).
I assume you gathered a lot of thoughts over these 15 years.
Should I invest in making the switch?
In environment where they don't cause frustration they're not worth it.
Not really more to it than that, from my point of view.
How does SSSD support help with SSH authN? I know you can now get Kerberos tickets from FreeIPA using OIDC(?), but I forget if SSSD is involved.
There are some serious security benefits for larger organizations but it does not sound as if you are part of one.
They also provide a way to get hardware-backed security without messing with SSH agent forwarding and crappy USB security devices. You can use an HSM to issue a temporary certificate for your (possibly temporary) public key and use it as normal. The certificate can be valid for just 1 hour, enough to not worry about it leaking.
The workflows SSH CA's are extremely janky and insecure.
With some creative use of `AuthorizedKeysCommand` you can make SSH key rotation painless and secure.
With SSH certificates you have to go back to the "keys to the kingdom" antipattern and just hope for the best.
It's not that certificates themselves are insecure themselves, it's that the workflows (as the parent points out) are awful. We might still add some automation around that (and I think I saw some competitor tooling out there if you're committed to that path) but I personally feel like it's an answer to the wrong question.
Whut? This is literally the opposite.
With CA certs you can create short-lived certificates, so you can easily grant access to a system for a short time.
However, it provides you an additional layer of protection, because it does not need to be on the critical path for every SSH connection. My CA is a Nitrokey HSM, for example. I issue myself temporary certs that are valid only for 6 hours for ephemeral private keys.
Thank for writing it!
Especially at a BigCo, where there are different environments, with different passwords, and password expiry/rotation/complexity rules.
Like, when asking for help, or working together... you say to them "ok, lets ssh to devfoo1234", and they do it, and then type in their password, and maybe get it wrong, then need to reset it, or whatever... and it takes half a minute or more for them to just ssh to some host. Maybe there are several hosts involved, and it all multiplies out.
I mention to them "you know... i never use ssh passwords, i don't actually know my devfoo1234 password... maybe you should google for ssh-keygen, set it up, let me know if you have any problems?" and they're like "oh yeah, thats cool. i should do that sometime later!".... and then they never do, and they are forever messing with passwords.
I just don't get it.
With my own machines I can just physically check that the server host key matches what the ssh client sees. Once TOFU looks good I'm all set with that host because I don't change any of the keys ever.
In a no-frills corporate unix environment it's enough to have a list of the internal servers' public keys listed on an internal website, accessible via SSL so it's effectively signed by a known corporate identity. You only need to check this list once to validate carrying out TOFU after which you can trust future connections.
In settings with huge fleet of machines or in a very dynamic environment where new machines are rolled out all the time it probably makes things easier to use certificates. Of course, certificates come with some extra work and some extra features so the amount of benefit depend on the case. But at this scale TOFU is breaking down bad on multiple levels so you can't afford a strong opinion against certificates, really.
I wish web browsers could remember server TLS host keys easily too and at least notify me whenever they change even if they'd still accept the new keys via ~trusted CAs.
[1] https://www.zscaler.com/resources/security-terms-glossary/wh...
Furthermore, it locks down the web browser's settings so that you cannot use a proxy server to bypass Zscaler's MITM. I saw this behavior in Mozilla Firefox, where the proxy option is set to "No proxy" and all other options are disabled and grayed out; I imagine that it does the same to Google Chrome. If you try to modify the browser's .ini(?) file for proxy settings, Zscaler immediately overwrites it against your will. Zscaler worked very hard to enforce its settings in order to spy on the computer user.
And as you'd expect, if you open up the Zscaler GUI in the system tray, you are presented with the option to disable the software if you have the IT password. Which of course, you don't have. Then again, that might be an epsilon better than the Cybereason antivirus software, which just has a system tray icon with no exit option, and cannot be killed in Task Manager even if you are a local administrator, and imposes a heavy slowdown if you're open hundreds of small text files per second.
The worst breakage by far is protocol breakage; basically anything that uses HTTP as a basis for building some other protocol gets broken all the time. None of the people implementing it seem aware. They buy the vendor's claim that it's "transparent", when in fact even "inspect/trace-only" modes often break all kinds of shit.
I've seen Umbrella break:
- Git
- RubyGems
- `go mod`
- OrbStack
- Matrix
- Cargo
- all JDKs
- Nix
- Pkgsrc
- all VMs
and probably some other things I'm forgetting. When this breakage is reported, the first round of replies is typically "I visited that domain in my enterprise-managed browser and it's not blocked". That is, of course, a useless and irrelevant test.Often it takes hours to even fully diagnose the breakage with enough confidence to point the finger at that tool and not some other endpoint security tool.
I'm not sure if the people buying and deploying tools in this category don't know how much stuff it breaks or just don't care. But the breakage is everywhere and nobody seems prepared for it.
Also, I've never had a security issue due to TOFU, have you?
This is a bit like suggesting you've never been in a car crash, so seat belts must not be worth considering.
Do you feel that beyond the obvious and documented work in setting them up, there are disadvantages to using SSH certificates?
However, if you do not need the extra features provided by certificates, using SSH-generated keys is strictly equivalent with using certificates and it requires less work.
TOFU is neither necessary nor recommended, it is just a convenience feature, to be used when security may be lax.
The secure way to use SSH is to never use TOFU but to pair the user and the server by copying the public keys between the 2 computers through a secure channel, e.g. either by using a USB memory or by sending the public keys through already existing authenticated encrypted links that pass through other computers. (Such a link may be a HTTPS download link.)
When using certificates, a completely identical procedure must be used. After certificates are generated, like also after SSH keys are generated, the certificates must be copied to the client computer and the server computer through secure channels.
That is not the case, and is a major advantage of certificates.
Just to make it clear: this does not mean that it is fine to blindly accept the message on first use.
The "secure way" implies copying the server's public key as well, which people generally don't do, right? Which is equivalent to verifying the fingerprint shown with the TOFU message, correct?
The server public key must be copied into "known_hosts" on the client, while the client public key must be copied into "authorized_keys" on the server.
When this is the procedure that is always followed, any message shown by SSH about an unknown host means that the connection must be aborted, because the identity of the server is unknown.
You cannot truly verify the "fingerprint" displayed by SSH, unless you simultaneously have access to another computer, where you have a copy of the fingerprint. What is usually meant by "verifying" is that you remember a few digits of the fingerprint, and those match.
You could have copied the fingerprint from the server, to be able to truly verify it, but that does not make sense, because in that case you should have copied the entire key, not just the fingerprint, and you should have installed it in the client.
When you use only authentication with digital signatures, it does not make sense to use any other procedure, because you must make at least one of the two copies anyway, so when copying the client key to the server you can take the server key, to carry it back to the client.
The TOFU method is meant to be used together with password-based authentication, in less secure applications, where no physical access to the server is required for setting up SSH on the client.
By "less secure" I mean for example applications equivalent to HTTPS, where the client is not really authenticated, e.g. when providing a public password allowing read-only access to an SSH server through Internet.
As for your ISP I think you should never rely on TOFU over the public internet. If you really don't want to do ssh certs it's easy enough to make the host key available securely via https.
Choosing to use TOFU is a distinct choice from the choice of using the keys generated by SSH, instead of using certificates.
If you do not want to use TOFU, for extra security, you just have to pair the computers by copying between them the corresponding public keys through a secure channel, e.g. by using a USB memory.
Using certificates does not add any simplification or any extra security.
For real security, you still must pair the communicating computers by copying between them the corresponding certificates, through a secure channel, e.g. a USB memory.
When you use for HTTPS the certificates that have come with your Internet browser, you trust that the installer package for the browser has come to that computer through a secure channel from the authority that has created the certificates. This is usually an assumption much more far fetched than the assumption that you can trust TOFU between computers under your control.
Certificates may be useful in big organizations, if other functionality is needed beyond just establishing secure communication channels, e.g. if you want to use certificate revocation.
In the list of "advantages" enumerated in the parent article, more than half of them are false, because if certificates are implemented correctly, completely equivalent actions must be executed when SSH keys without TOFU are used and when certificates are used.
Perhaps the author meant by writing some of the "advantages" that the actions that supposedly are no longer needed with certificates are done by an administrator, not by the user. However that is also applicable with SSH. An administrator could install the certificates, so that no action is required from the user, but an administrator can also install the SSH public keys, so that no TOFU is ever needed from the user.
Using certificates requires exactly the same steps like using keys generated by SSH (i.e. generating certificates and copying them between computers through secure channels, to pair the servers and the authorized users), but it may need additional steps, caused by the fact that certificates provide additional functionality.
With the other, I'm logging into a server for the first time (and I could simply deploy the same trusted host key to all my ssh servers via an autoscaling configuration or whatever). I think it's debatable if TOFU is worse or better than your (granted clever) metaphor.
(to those who'd recommend userify, yes - great for the client login issue and definitely increases security, but to parent's point, TOFU is still needed unless you want to distribute host pubkeys)
To visit this site, there is no pairing, because the site does not know who I am.
In order to verify the identity of the HN site, I must trust that the maintainers of the installation packages of the browsers that I use (Firefox, Vivaldi, Chromium) have ensured that the built-in certificates have reached me through a secure path. This actually requires much more trust than when someone answers "yes" to the SSH unknown host message.
If I use certificates for accessing e.g. the network of my employer, then my work computer must be paired with some corporate server, i.e. a unique certificate has been generated for myself and it has been copied to some certificate authority server for signing and then to my computer, and also a certificate of the local certificate authority has been copied to my personal computer.
While pairing is unavoidable for bidirectional authentication, it is not necessarily direct between the end points. Both end points must have been paired with at least one other computer but they need not have been paired between themselves previously if there exists some path through secure connections that have been originally created by pairing.
When certificates are used, usually the pairings are not done directly between end points, but each computer must be paired with the server hosting the certificate authority.
The term "pairing" is not used frequently, but it should have been preferred, because frequently the users do not understand which are the exact actions on which the security of their communications depend, which leads to various exploits. The critical security actions are those that perform the pairing.
"Pairing" of 2 systems, e.g. A and B, means that some information must be transmitted through a secure channel from A to B and some other information must be transmitted through a secure channel from B to A. An alternative pairing method is to generate both pieces of information on one of the 2 systems and transmit both of them through a secure channel to the other. The information exchange channels must already be secure, because before pairing authentication is impossible.
The pairing between a PC and the server hosting the certificate authority can be done in various ways, depending on where the PC certificate is generated. If the certificate is generated at the certificate authority than both it and the root certificate must be copied through a secure channel to the PC. If the certificate is generated on the PC, it must be sent through a secure channel to the CA for signing, then it must be sent back also through a secure channel.
In practice, administrators are not always careful enough for the channels through which certificates are copied to be really secure. For instance they may be copied through network links that are not yet authenticated, which is equivalent with the TOFU method optionally used by SSH.
This can be anything from being part of the install script to customized deployment image to physical access to access via a host in virtualized scenarios.
TOFU only really comes into play when the box is already set up and you have no other way to load things onto the box other than connecting via SSH to do so. But, again, that would be the same story if you were intending to go the certificate approach too.
The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.
Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.
Common friction points:
1. your signer that has to be up and correct all the time
2. trust roots everywhere (and drifting)
3. TTL tuning nonsense (too short = random lockouts, too long = what was the point)
4. limited on-box state makes debugging harder than it should be
5. failures tend to fan out instead of staying contained
Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.
We went the opposite direction:
1. nodes pull over outbound HTTPS
2. local authorized_keys is the source of truth locally
3. users/roles are visible on the box
4. drift fixes itself quickly
5. no inbound ports, no CA signatures (WELL, not strictly true*!)
You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.
And the part that usually gets hand-waved with SSH CAs:
1. creating the user account
2. managing sudo roles
3. deciding what happens to home directories on removal
4. cleanup vs retention for compliance/forensics
Those don’t go away - they're just not part of the certificate solution.* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)
I've set up a couple of yubikeys as SSH CAs on hosts I manage. I use them to create short lived certs (say 24h) at the start of the day. This way i only have to enter the yubikey pin once a day.
I could not find an easy way to limit maximum certificate lifetime in openssh, except for using the AuthorizedPrincipalCommand, which feels very fragile.
Does anyone else have any experience with a similar setup? How do you limit cert max lifetime?
There really needs to be a definitive best practices guide published by a trusted authority.
We've always taken the stance that crusty is better than vulnerable, but it turns out that not having a modern experience after 15 years is starting to feel like maybe we need to step up the features and shininess :)
They've rolled their host key one time, so there's little reason for them to use it on the host side.
Openssh supports checking the DNSSEC signature in the client, in theory, but it's a configure option and I'm not sure if distros build with it.
Additionally, as I mentioned, openssh itself has support for validating the DNSSEC signature even if your local resolver doesn't. I actually don't think it can use the standard resolver for SSHFP records at all, but I'm not sure.
Is that yet another problem that I need to solve with syncthing?
Sure you need your signing service to be reasonably available, but that’s easily accomplished.
Maybe I misunderstand?
Except for everything around that:
* user lifecycle (create/remove/rename accounts)
* authz (who gets sudo, what groups, per-host differences)
* cleanup (what happens when someone leaves)
* visibility (what state is this box actually in right now?)
SSH certs don’t really touch any of that. They answer can this key log in right now, not what should exist on this machine.
So in practice, something else ends up managing users, groups, sudoers, home dirs, etc. Now there are two systems that both have to be correct.
On the availability point: "reasonably available" is doing a lot of work ;)
Even with 1-hour certs:
* new sessions depend on the signer
* fleet-wide issues hit everything at once
* incident response gets awkward if the signer is part of the blast radius
The failure mode shifts from a few boxes don't work to nobody can get in anywhere
The pull model just leans the other way:
* nodes converge to desired state
* access continues even if control plane hiccups
* authn and authz live together on the box
Both models can work - it’s more about which failure mode is tolerable to you.
But for just getting access to role accounts then I find it a lot nicer than distributing public keys around.
And for everything else, a periodic Ansible :-)
Replacing the distribution of a revocation list with short-lived certificates just creates other problems that are not easier to solve. (Also, 1h is bonkers, even letsencrypt doesn't do it)
IMHO, if you're pushing revocation lists at low latency, you could also push authorized keys updates at low latency.
This sentence is a bit of a red flag, it looks like the author is making a (subtle) mistake in the category of too much security, or at least misjudging the amount of security (objectively measurable entropy) needed. This is of course a less consequential error than too little entropy/security measures, but still if one wants to be a cybersecurity professional, especially one with influence, they must know exactly the right amount needed, because our resources are limited, and each additional bit of entropy and security step not only costs time of the admin that implements it, but of the users that have to follow it, and this can even impact security itself by fatiguing the user and causing them to circumvent measures or ignore alerts.
On to specifically what's wrong:
Either a key file or a password can be used to log in to a server or authenticate to any service in general. Besides the technical implementation, the main difference is whether the secret is stored on the device, or in the user's brain. One is not more correct than the other, there's a lot of tradeoffs, one can ensure more bits and is more ergonomic, the other is not stored on device so it cannot be compromised that way.
That said a 2FA approach, in whatever format, is (generally speaking) safer than any individual method, in that the two secrets are necessary to be granted access. In this scenario one needs both the file and the password to authenticate, even if the password is 4 digits long, that increases the security of the system when compared to no password. An attacker would have to setup a brute force attempt along with a way to verify the decryption was successful. If local decryption confirmation is not possible, then such a brute force attack would require to submit erroneous logins to the server potentially activating tripwires or alerting a monitoring admin.
There's nothing special about the second factor authorization being equal or equivalent in entropy to the first, and there's especially no requirement that a password have more entropy when it's a second authorization, in fact it's the other way around.
tl;dr You can consider each security mechanism in the wider context rather than in isolation and you will see security fatigue go down without compromising security.