Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent connection loss with HANDSHAKE(REKEY_TIMEOUT) errors #363

Open
msherman13 opened this issue Sep 9, 2023 · 11 comments
Open

Comments

@msherman13
Copy link

msherman13 commented Sep 9, 2023

Preface:

  • running warp client but i think this is an issue with boringtun under the hood

Setup:

  • boringtun as part of the warp client on Rocky 9.2\
  • cloudflare zero-trust with local fallback domain and single tunnel
  • split tunnel in exclude mode with nothing excluded (everything routes through the single tunnel)
  • cloudflare authentication using SSO (via google workspace SAML)
  • warp-cli version 2023.7.40

Issue:
I connect the warp client successfully and everything works (routes to my private network as well as internet routes). Eventually, sometimes minutes sometimes hours, all network connectivity dies. In this state I cannot access anything, including DNS and internet. In the below snippet of boringtun.log, the connectivity dies around 14:05:32 (timestamp collected by pinging google.com every second until the issue is observed).

2023-09-09T13:59:37.890Z DEBUG boringtun::noise: Sending keepalive
2023-09-09T13:59:39.377Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:00:03.377Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:00:27.676Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:00:37.877Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=6252035 index=24422
2023-09-09T14:00:51.976Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:01:15.977Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:01:38.176Z DEBUG boringtun::noise::timers: HANDSHAKE(REKEY_AFTER_TIME (on send))
2023-09-09T14:01:38.176Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:01:38.188Z DEBUG boringtun::noise: Received handshake_response local_idx=6252037 remote_idx=6836995
2023-09-09T14:01:38.189Z DEBUG boringtun::noise: New session session=6252037 index=24422
2023-09-09T14:01:38.189Z DEBUG boringtun::noise: Sending keepalive
2023-09-09T14:01:40.276Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:02:04.576Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:02:28.576Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:02:38.177Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=6252036 index=24422
2023-09-09T14:02:52.876Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:03:17.176Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:03:38.177Z DEBUG boringtun::noise::timers: HANDSHAKE(REKEY_AFTER_TIME (on send))
2023-09-09T14:03:38.178Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:03:38.191Z DEBUG boringtun::noise: Received handshake_response local_idx=6252038 remote_idx=6836996
2023-09-09T14:03:38.192Z DEBUG boringtun::noise: New session session=6252038 index=24422
2023-09-09T14:03:38.192Z DEBUG boringtun::noise: Sending keepalive
2023-09-09T14:03:41.177Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:04:05.476Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:04:29.476Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:04:38.176Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=6252037 index=24422
2023-09-09T14:04:53.477Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:05:14.176Z  WARN boringtun::noise::timers: HANDSHAKE(KEEPALIVE + REKEY_TIMEOUT)
2023-09-09T14:05:14.177Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:19.276Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:19.277Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:24.377Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:24.377Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:29.477Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:29.478Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:34.577Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:34.577Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:39.677Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:39.678Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:44.777Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:44.777Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:49.877Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:49.877Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:05:54.976Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:05:54.977Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:00.076Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:00.076Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:05.177Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:05.177Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:10.276Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:10.277Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:15.376Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:15.377Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:20.476Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:20.477Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:25.577Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:25.577Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:30.677Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:30.677Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:35.776Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:35.777Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:38.477Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=6252038 index=24422
2023-09-09T14:06:40.877Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-09T14:06:40.877Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:44.177Z ERROR boringtun::noise::timers: CONNECTION_EXPIRED(REKEY_ATTEMPT_TIME)
2023-09-09T14:06:44.237Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:44.238Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:06:44.972Z DEBUG boringtun::noise: Received handshake_response local_idx=13336833 remote_idx=12028672
2023-09-09T14:06:44.973Z DEBUG boringtun::noise: New session session=13336833 index=52097
2023-09-09T14:06:44.973Z DEBUG boringtun::noise: Sending keepalive
2023-09-09T14:07:08.422Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:07:32.722Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:07:56.723Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:08:21.022Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:08:44.423Z DEBUG boringtun::noise::timers: HANDSHAKE(REKEY_AFTER_TIME (on send))
2023-09-09T14:08:44.424Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:08:44.436Z DEBUG boringtun::noise: Received handshake_response local_idx=13336834 remote_idx=12028673
2023-09-09T14:08:44.437Z DEBUG boringtun::noise: New session session=13336834 index=52097
2023-09-09T14:08:44.437Z DEBUG boringtun::noise: Sending keepalive
2023-09-09T14:08:45.023Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:09:09.023Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:09:33.323Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:09:44.422Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=13336833 index=52097
2023-09-09T14:09:57.622Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-09T14:10:09.022Z  WARN boringtun::noise::timers: HANDSHAKE(KEEPALIVE + REKEY_TIMEOUT)
2023-09-09T14:10:09.023Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-09T14:10:09.140Z DEBUG boringtun::noise: Received handshake_response local_idx=13336835 remote_idx=15038208
@msherman13
Copy link
Author

anyone able to help here?

@csepulveda
Copy link

Same issue here:

2023-09-20T18:00:24.128Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:00:48.298Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:01:12.487Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:01:36.691Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:02:00.870Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:02:25.090Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:02:35.978Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=4124895233
2023-09-20T18:02:49.299Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:03:13.498Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:03:17.427Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=4124895234
2023-09-20T18:03:37.708Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-09-20T18:03:37.708Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:03:42.838Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:03:42.839Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:03:47.992Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:03:47.993Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:03:53.125Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:03:53.126Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:03:58.255Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:03:58.255Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:03.392Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:03.392Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:08.540Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:08.541Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:13.695Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:13.695Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:18.841Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:18.841Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:23.986Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:23.989Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:29.193Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:29.193Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:34.348Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:34.349Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:39.479Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:39.480Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:44.667Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:44.668Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:49.805Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:49.805Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:04:54.935Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:04:54.936Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:05:00.078Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:05:00.079Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:05:05.224Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-09-20T18:05:05.224Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:05:07.938Z ERROR boringtun::noise::timers: CONNECTION_EXPIRED(REKEY_ATTEMPT_TIME)
2023-09-20T18:05:08.061Z DEBUG boringtun::noise: Sending handshake_initiation
2023-09-20T18:05:08.072Z DEBUG boringtun::noise: Received handshake_response local_idx=2722392065 remote_idx=1259440
2023-09-20T18:05:08.072Z DEBUG boringtun::noise: New session session=2722392065
2023-09-20T18:05:08.072Z DEBUG boringtun::noise: Sending keepalive

any ideas what could be? do you need any other info?

@msherman13
Copy link
Author

I tried on fedora 36 and it’s even worse. Connection freezes after 5 seconds every time. Guess I’ll use tailscale

@JonathonFS
Copy link

JonathonFS commented Dec 1, 2023

Similar issue, except on Windows 10. Traffic drops for 1-2 minutes at a time, every 5-10 minutes or so. I can consistently ping google (8.8.8.8) and the cloudflare VPN endpoint this client is using (172.68.168.79), so doesn't seem like an issue with the internet connection. There are 4 other remote clients in this organization that have the same setting applied, but they don't experience the issue. However, those 4 clients are connecting from different locations and are connecting to different cloudflare VPN endpoints.

  • Mode: Gateway with WARP
  • Split Tunnel: Include IPs and Domains
  • WARP v23.11.6.0
  • Cloudfared v2023.10.0

cfwarp_service_boring.txt shows:

2023-12-01T11:02:33.556Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:02:57.679Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:03:21.912Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:03:46.128Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:03:50.169Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=10645661 index=41584
2023-12-01T11:04:10.334Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:04:24.301Z  WARN boringtun::noise::timers: HANDSHAKE(KEEPALIVE + REKEY_TIMEOUT)
2023-12-01T11:04:24.301Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:04:29.540Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:04:29.540Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:04:34.824Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:04:34.824Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:04:40.084Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:04:40.085Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:04:45.349Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:04:45.349Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:04:50.627Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:04:50.627Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:04:55.917Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:04:55.918Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:05:01.191Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:05:01.191Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:05:06.463Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:05:06.463Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:05:11.733Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:05:11.733Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:05:11.754Z DEBUG boringtun::noise: Received handshake_response local_idx=10645681 remote_idx=16685579
2023-12-01T11:05:11.755Z DEBUG boringtun::noise: New session session=10645681 index=41584
2023-12-01T11:05:11.755Z DEBUG boringtun::noise: Sending keepalive
2023-12-01T11:05:12.045Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)

About 30-50% of the time I also get an Incorrect sender index for responder gone packet error. But this message doesn't show up consistently when I experience the problem.

2023-12-01T11:00:04.824Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:00:29.020Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:00:50.141Z DEBUG boringtun::noise::timers: HANDSHAKE(REKEY_AFTER_TIME (on send))
2023-12-01T11:00:50.141Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:00:50.166Z DEBUG boringtun::noise: Received handshake_response local_idx=10645661 remote_idx=8409149
2023-12-01T11:00:50.166Z DEBUG boringtun::noise: New session session=10645661 index=41584
2023-12-01T11:00:50.166Z DEBUG boringtun::noise: Sending keepalive
2023-12-01T11:00:53.244Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:01:17.495Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:01:17.520Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:17.607Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:17.838Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:18.002Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:18.109Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:18.309Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:18.670Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:18.919Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:19.757Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:20.131Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:21.938Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:22.532Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:22.927Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:01:41.699Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:01:45.749Z  WARN boringtun::noise::timers: HANDSHAKE(KEEPALIVE + REKEY_TIMEOUT)
2023-12-01T11:01:45.750Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:01:50.416Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=10645660 index=41584
2023-12-01T11:01:51.040Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:01:51.041Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:01:56.335Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:01:56.335Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:01.622Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:01.622Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:06.891Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:06.891Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:12.158Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:12.158Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:17.427Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:17.427Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:22.696Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:22.696Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:27.949Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:27.949Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:31.916Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:02:32.101Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:02:32.486Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:02:32.933Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:02:33.035Z DEBUG boringtun::noise: Incorrect sender index for responder gone packet packet_index=8409088 session_index=8409149
2023-12-01T11:02:33.249Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T11:02:33.249Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T11:02:33.270Z DEBUG boringtun::noise: Received handshake_response local_idx=10645671 remote_idx=16685569
2023-12-01T11:02:33.271Z DEBUG boringtun::noise: New session session=10645671 index=41584
2023-12-01T11:02:33.271Z DEBUG boringtun::noise: Sending keepalive
2023-12-01T11:02:33.556Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:02:57.679Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T11:03:21.912Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)

I'm currently trying to find a good way to re-home clients to different cloudflare VPN endpoints. Maybe this client's VPN endpoint (172.68.168.79) is just having a bad day?

@JonathonFS
Copy link

JonathonFS commented Dec 1, 2023

EDIT: This didn't actually assign a different "VPN endpoint". It just caused a change in the internal VPN routing, which was observable when running tracert to a tunneled subnet. See post on 12/11/2023 for a real way to change the VPN endpoint.

I was able to get the client shifted over to another VPN endpoint by revoking access, restarting the warp service, and re-authorizing the device (not sure if both were necessary). After that the system was solid for about an hour. Then it started dropping out again with the same error.

2023-12-01T20:55:25.094Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:55:49.309Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:56:09.793Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=11520023 index=45000
2023-12-01T20:56:13.526Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:56:37.752Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:57:01.948Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:57:09.980Z DEBUG boringtun::noise::timers: HANDSHAKE(REKEY_AFTER_TIME (on send))
2023-12-01T20:57:09.981Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T20:57:10.000Z DEBUG boringtun::noise: Received handshake_response local_idx=11520025 remote_idx=13113367
2023-12-01T20:57:10.000Z DEBUG boringtun::noise: New session session=11520025 index=45000
2023-12-01T20:57:10.000Z DEBUG boringtun::noise: Sending keepalive
2023-12-01T20:57:26.099Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:57:50.278Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:58:09.859Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=11520024 index=45000
2023-12-01T20:58:14.481Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:58:38.680Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:59:02.918Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:59:10.063Z DEBUG boringtun::noise::timers: HANDSHAKE(REKEY_AFTER_TIME (on send))
2023-12-01T20:59:10.063Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T20:59:10.082Z DEBUG boringtun::noise: Received handshake_response local_idx=11520026 remote_idx=13113368
2023-12-01T20:59:10.082Z DEBUG boringtun::noise: New session session=11520026 index=45000
2023-12-01T20:59:10.082Z DEBUG boringtun::noise: Sending keepalive
2023-12-01T20:59:27.150Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T20:59:51.352Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T21:00:10.180Z DEBUG boringtun::noise::timers: SESSION_EXPIRED(REJECT_AFTER_TIME) session=11520025 index=45000
2023-12-01T21:00:15.420Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)
2023-12-01T21:00:25.670Z  WARN boringtun::noise::timers: HANDSHAKE(KEEPALIVE + REKEY_TIMEOUT)
2023-12-01T21:00:25.670Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:00:30.950Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:00:30.950Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:00:36.216Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:00:36.216Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:00:41.511Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:00:41.511Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:00:46.800Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:00:46.800Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:00:52.090Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:00:52.090Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:00:57.346Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:00:57.346Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:01:02.620Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:01:02.620Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:01:07.905Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:01:07.906Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:01:13.188Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:01:13.188Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:01:18.289Z  WARN boringtun::noise::timers: HANDSHAKE(REKEY_TIMEOUT)
2023-12-01T21:01:18.290Z DEBUG boringtun::noise: Sending handshake_initiation
2023-12-01T21:01:18.310Z DEBUG boringtun::noise: Received handshake_response local_idx=11520037 remote_idx=13113369
2023-12-01T21:01:18.311Z DEBUG boringtun::noise: New session session=11520037 index=45000
2023-12-01T21:01:18.311Z DEBUG boringtun::noise: Sending keepalive
2023-12-01T21:01:18.589Z DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)

Back to the drawing boarding...

@JonathonFS
Copy link

JonathonFS commented Dec 4, 2023

I setup ping monitors on the workstation. The problem seems to come and go. Sometimes the problem will occur every 5-10 minutes, other times there won't be any issues for a few hours.

I took a closer look at the logs using Excel. Seems the protocol alternates between two overlapping sessions, which I'm calling Session A and Session B. Each session lasts 3 minutes, then waits 1 minute before re-keying a new session.

Everything looks peachy in the logs below, until we get to line 55. We should expect to see the first keepalive for Session A, which would look something like this:
DEBUG boringtun::noise::timers: KEEPALIVE(PERSISTENT_KEEPALIVE)

Instead we get this around the same time we'd expect to get the keepalive:
WARN boringtun::noise::timers: HANDSHAKE(KEEPALIVE + REKEY_TIMEOUT)

I'm not exactly sure if this REKEY_TIMEOUT is for session A or B. Session B is already 12 seconds into it's rekey cool-down period. But I put the rekey failures in the session B column anyways, since we still get a SESSION_EXPIRED log later on for session A.

image

So the problem seems to be that this KEEPALIVE + REKEY_TIMEOUT timer is forcing boringtun to rekey session A while session B is in it's 1 minute rekey cool down period. For whatever reason, boringtun can't seem to rekey session A during the 1 minute cool-down. Maybe the server won't allow it to rekey early? Doesn't really matter, since this seems to be a symptom of the root problem.... which is what causes the KEEPALIVE + REKEY_TIMEOUT timer to trigger at an in-opportune time?

@JonathonFS
Copy link

In the Zero Trust admin center, under Settings > Authentication, I tried changing the Global session timeout to 1 Month. This had no effect, so I changed it back to Same as application session timeout.

Been doing some more reading on the subject, and WireGuard states:

A handshake initiation is retried after REKEY_TIMEOUT + jitter ms, if a response has not been received, where jitter is some random value between 0 and 333 ms.

Looking at the log's timestamps, I believe REKEY_TIMEOUT = 5 seconds and KEEPALIVE = 25 seconds. WireGuard goes on to say:

If we have sent a packet to a given peer but have not received a packet after from that peer for KEEPALIVE + REKEY_TIMEOUT ms, we initiate a new handshake.

This means we should expect to see the KEEPALIVE + REKEY_TIMEOUT timer go off if it's been 30+ seconds since we sent a packet and have had no response. So now the question becomes, why aren't we getting a response in that 30 second window?

  • Is a response package being sent?
    • The connection works intermittently, so I'm pretty sure cloudflare is sending a response.
  • Is there a delay?
    • Could the cloudflare VPN gateway be too busy? (I doubt it.)
    • Could the workstation's ISP be throttling UDP?
    • Possible Solution: Increase REKEY_TIMEOUT to 10 or 15 seconds
  • Is the response getting blocked/dropped?
    • The response isn't being blocked, since it works intermittently.
    • If the ISP is using CGNAT, and the dynamic IP bounces around frequently, an inbound UDP packet might get lost.
    • Possible Solution: Decrease the KEEPALIVE to 20 or 15 seconds? The ISP must have some mechanism in place to preserve active UDP streams as they're migrated within the CGNAT. Decreasing the keepalive could help ensure the UDP session is preserved.

Does anyone know how to tweak the REKEY_TIMEOUT and KEEPALIVE parameters? I don't see this anywhere in the Windows registry or config files.

@JonathonFS
Copy link

I've implemented some monitoring that allows me to see the number of KEEPALIVE + REKEY_TIMEOUT (KART) and standalone REKEY_TIMEOUT (RKTO) events. The workstation at issue has improved over the past week, but still experiences a significantly higher number of KART events from our other workstations... with the exception one other. This other PC (bottom of image) has 18x the number of KART events, but almost zero RKTO events and no network dropouts.

image

This is interesting, because it shows the KART event can be handled gracefully, and doesn't need to cause packet loss. So now the question becomes, why can one workstation recover from a KART event in under 100ms, while another workstation is forced to wait until the aforementioned 1 minute cool-down expires? I tried shifting the workstation at issue over to the working PC's VPN endpoint. Maybe the original VPN endpoint doesn't like early rekeys?

warp-cli tunnel endpoint set 162.159.193.8:2408

If that doesn't work, I'll try one of the alternative ports. Maybe the ISP will handle traffic on UDP 500, 1701, or 4500 better.

@JonathonFS
Copy link

It's been a week, and we've had no more traffic dropouts. Changing the VPN endpoint seems to have done the trick. KEEPALIVE + REKEY_TIMEOUT (KART) events are down from 11.34/day to 1.16/day. More importantly, REKEY_TIMEOUT (RKTO) events are down from 147.1/day to 0/day.

So maybe the issue wasn't with boringtun after all... Could be something on the Cloudflare VPN endpoint side.

@cowlicks
Copy link

This issue seems to be related to this issue in Nord's fork of this repo. It has been resolved there.

@JonathonFS
Copy link

Nice find @cowlicks! I'm still seeing large numbers of "KEEPALIVE + REKEY_TIMEOUT" KART events and "REKEY_TIMEOUT" RKTO events, as shown below. The errors do not correlate with a particular VPN Endpoint, as I had speculated.

image
image

The vast majority (83%) of our WARP clients are on version 24.3.409.0. Most people don't report connectivity issues, but a few do. Would be great if we could get Nord's PR implemented in Cloudflare's repo, as was mentioned here. Worst case scenario, this would go a long way to de-clutter the logs and help us pinpoint why some users have connectivity issues, while others don't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants