undefined

points

[-]

If hung SSH connections are common it's likely due to CGNAT which use aggressively low TCP timeouts. e.g. I've found all UK mobile carriers set their TCP timeout as low as 5 minutes. The "default" is supposed to be 2 hours, you could literally sleep your computer, zero packets, and an SSH connection would continue to work an hour later, and generally speaking this is still true unless CGNAT is in the way.

If you are interested there are a few ways you can fix this:

Easiest is to use a VPN, because the VPN's exit node becomes the effective NAT they usually have normal TCP timeouts due to being less resource constrained. Another nice benefit of this method is you can move between physical networks and your connection doesn't die... If you use Tailscale then you already have this in a more direct way.

Another is to tune the tcp_keepalive kernel parameters. Lowering the keepalive timeout to be less than the CGNAT timeout will cause keepalive probes to prevent CGNAT from dropping the connection even while your SSH connection is technically idle. For Linux I pop these into /etc/sysctl.d/z.conf, I have no idea for Windows or Mac:

  # Keepalive frequently to survive CGNAT
  net.ipv4.tcp_keepalive_time   = 240 
  net.ipv4.tcp_keepalive_intvl  = 60
  net.ipv4.tcp_keepalive_probes = 120

This is really a misuse of these settings, they are supposed to be for checking TCP connections are still alive and clearing them up from the local routing table. Instead the idea is to exploit the probes by sending them more frequently to force idle connections to stay alive in a CGNAT environment (dont worry the probes are tiny and still very infrequent).

_time=240 will send a probe after 4 mins of idle connection instead of the default 2 hours, undercutting the CGNAT timeout. _intvl=60 and _probes=120 mean it will send 120 probes 60 seconds apart (2 hours worth) before considering the connection dead. This will keep it alive for at least 2 hours, but also allows us to have the best of both worlds so that under a nice NAT it keeps the old behaviour, e.g if I temporarily lose my network the SSH connection is still valid after 2 hours, but under CGNAT it will at least not drop the connection after 5 mins so long as I keep my computer on and don't lose the network.

There are also some SSH client keepalive settings but I'm less familiar with them.

by vbezhenar3 hours ago|

parent|

[-]

    Host *
        ServerAliveInterval 25

by snvzz7 hours ago|

parent|

prev|

[-]

Note this is only an issue if not using IPv6.

CGNAT is for access to legacy IPv4 only.

by rnhmjoj5 hours ago|

parent|

[-]

Well, for different reasons, but you have similar issues with IPv6 as well. If your client uses temporary addresses (most likely since they're enabled by default on most OS), OpenSSH will pick one of them over the stable address and when they're rotated the connection breaks.

For some reason, OpenSSH devs refuse to fix this issue, so I have to patch it myself:

    --- a/sshconnect.c
    +++ b/sshconnect.c
    @@ -26,6 +26,7 @@
     #include <net/if.h>
     #include <netinet/in.h>
     #include <arpa/inet.h>
    +#include <linux/ipv6.h>
     
     #include <ctype.h>
     #include <errno.h>
    @@ -370,6 +371,11 @@ ssh_create_socket(struct addrinfo *ai)
      if (options.ip_qos_interactive != INT_MAX)
        set_sock_tos(sock, options.ip_qos_interactive);
     
    + if (ai->ai_family == AF_INET6 && options.bind_address == NULL) {
    +  int val = IPV6_PREFER_SRC_PUBLIC;
    +  setsockopt(sock, IPPROTO_IPV6, IPV6_ADDR_PREFERENCES, &val, sizeof(val));
    + }
    +
      /* Bind the socket to an alternative local IP address */
      if (options.bind_address == NULL && options.bind_interface == NULL)
        return sock;

by gspr3 hours ago|

parent|

[-]

Interesting! Is there anywhere a discussion around their refusal to include your fix?

by rnhmjoj59 minutes ago|

parent|

[-]

See this, for example: https://groups.google.com/g/opensshunixdev/c/FVv_bK16ADM/m/R...

It boilds down to using a Linux-specific API, though it's really BSD that is lacking support for a standard (RFC 5014).

by dsl5 hours ago|

parent|

prev|

[-]

This is a very common misconception. The issue is not IPv4 or CGNAT, it's stateful middleboxes... of which IPv6 has plenty.

The largest IPv6 deployments in the world are mobile carriers, which are full of stateful firewalls, DPI, and mid-path translation. The difference is that when connections drop it gets blamed on the wireless rather than the network infrastructure.

Also, fun fact: net.ipv4.tcp_keepalive_* applies to IPv6 too. The "ipv4" is just a naming artifact.

by anthk4 hours ago|

parent|

prev|

[-]

Check Mosh. It supports these kind of cuts and it will reconnect seamlessly. It will use far less bandwidth too. I successfully tried it with a 2.7 KBPS connection.

by iberator4 hours ago|

parent|

prev|

[-]

putty is sending packets for network up since like forever

by lathiat11 hours ago|

prev|

[-]

Have been using that weekly since probably 20 years. Will change your life :)

My other favourite is I very often SSH with -v to figure out why the connection is hanging, you rapidly figure out if DNS is failing, the TCP connection doesn't open, it does open but no traffic flows at all or it opens and SSH negotiation starts but never finishes. You can learn a lot just from this about what is wrong.

by Izkata8 hours ago|

parent|

[-]

Also helps with auth failures, I've used it several times with co-workers who can't figure out why their ssh key isn't working. It lists the keys out and some extra information.

by sirfz11 hours ago|

prev|

[-]

You can even chain them if you have deep ssh connections (i.e. ssh from one instance to another). I think it would be ~~. to terminate the 2nd hop.

Edit: it's already explained in the OP

by tdeck5 hours ago|

prev|

[-]

You don't need to actually open the menu either. Just hit enter, tilde, ., enter.

by aa-jv2 hours ago|

prev|

[-]

I last used this menu about 20 years ago when a dialup modem was the only way to roll, and have pretty much forgotten about it since the days of always-on direct to the desktop TCP/IP ..

by fragmede8 hours ago|

prev|

[-]

Just ssh to funky.nondeterministic.computer to test it out!

by wolvoleo11 hours ago|

prev|

[-]

I use that every day but it's the only one I know by heart lol

by TacticalCoder10 hours ago|

prev|

[-]

> It's much nicer than my current approach of having to close that terminal window.

You can also just kill the ssh process (say from another terminal). That way you get to keep your terminal window. And this works with everything "blocking" your terminal, not just ssh.

by shmerl11 hours ago|

prev|

[-]

I've been using ~. on hung ssh connections for a while.