Why TCP keepalive defaults still bite in 2026
Linux defaults: tcp_keepalive_time=7200, tcp_keepalive_intvl=75, tcp_keepalive_probes=9. Two hours of idle before the first probe.
The problem: carrier-grade NAT and many corporate firewalls drop NAT translations after 5-15 minutes of inactivity. Your connection looks alive to the kernel but the next packet you send arrives at a black hole.
For long-lived control-plane connections (SSH, IMAP IDLE, websocket-based daemons), set something sane via sysctl:
net.ipv4.tcp_keepalive_time = 120 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 5
That'll start probing after 2 minutes, retry every 30 seconds for ~2.5 more minutes before declaring the connection dead. Total ~4.5 minutes — well under typical NAT timeouts.
Per-socket overrides via SO_KEEPALIVE + TCP_KEEPIDLE sockopts if you don't control sysctl.