Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to re-establish session connection after unexpected disconnect #5832

Merged
merged 4 commits into from
Dec 19, 2024

Conversation

jmcphers
Copy link
Collaborator

This change adds resiliency to the WebSocket connections that Positron uses to send and receive kernel messages. We were already detecting unexpected connection drops and using them to recover from server crashes. However, we've seen a few reports in the wild of these connection drops happening even when everything else is healthy.

The improvement here is as follows:

  • If the session appears to be running, and the server is also running, we try to reestablish the connection. If we can, it is seamlessly resumed. Because messages that aren't delivered are queued on the supervisor side, we shouldn't miss any.
  • If the connection cannot be re-established, the user is shown a message and the session is marked offline. This prevents the session from looking inexplicably frozen.

To help test this change without leaving Positron running for a day or more, I've added a development-only reconnect command that disconnects the socket for the active console session and lets the reconnect logic kick in to bring it back online.

Addresses #5788.

QA Notes

If you can't reproduce the original issue, you can use a development build and this command.

image

Since reconnect is meant to be seamless, check that the session can still run code after reconnecting, and check the logs to ensure that the reconnect sequence happened as expected. You should see something like this:

Session 'R 4.3.3' disconnected while in state 'idle'. This is unexpected; checking server status.
Kallichore server PID 75482 is still running
The server is still running; attempting to reconnect to session r-e6137faa
Successfully restored connection to  r-e6137faa

Copy link

E2E Tests 🚀  ?
This PR will run tests tagged with: @critical

@jmcphers jmcphers requested a review from sharon-wang December 19, 2024 16:43
Copy link
Member

@sharon-wang sharon-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I tried out the reconnect command in Desktop and Server Web on Mac with both Python and R.

  • The reconnect was super quick and unnoticeable from the Console
  • I could execute code after the reconnect
  • The expected reconnect sequence was observed in the kernel supervisor logs, e.g.:
    Session 'Python 3.10.4 (Pyenv)' disconnected while in state 'idle'. This is unexpected; checking server status.
    Kallichore server PID 97063 is still running
    The server is still running; attempting to reconnect to session python-b17b94a7
    Successfully restored connection to  python-b17b94a7
    


// Disconnect the session; since the session is active, this should
// trigger a reconnect.
kallichoreSession.log('Disconnecting by user request', vscode.LogLevel.Info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, where does this log out to? If a user wanted to find this log output, where would they go? I don't think it goes to the Kernel Supervisor Output

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct! It logs to the Console output logs for the session (e.g. Python 3.12.4 Console).

@jmcphers jmcphers merged commit b73bb15 into main Dec 19, 2024
6 checks passed
@jmcphers jmcphers deleted the bugfix/reconnect-session-socket branch December 19, 2024 20:34
@github-actions github-actions bot locked and limited conversation to collaborators Dec 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants