Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resource leaks in runtime and server components #4987

Closed
wants to merge 1 commit into from

Conversation

rbren
Copy link
Collaborator

@rbren rbren commented Nov 13, 2024

This PR fixes file descriptor and network connection leaks identified in the OpenHands project. The changes include:

Changes

  1. Request Sessions:

    • Added proper cleanup of request sessions in all runtime implementations
    • Added error handling and logging for session cleanup failures
    • Added checks to prevent NullPointerExceptions during cleanup
    • Ensured sessions are closed even if runtime stop fails
  2. WebSocket Connections:

    • Added proper cleanup of WebSocket connections in the server
    • Added async-safe WebSocket closure using the event loop
    • Added error handling and logging for WebSocket cleanup failures
    • Nulled out WebSocket references after cleanup
  3. Runtime Resources:

    • Added proper cleanup sequence for runtime resources
    • Added error handling for each cleanup step
    • Added nulling of references after cleanup to help garbage collection
    • Added logging for cleanup failures
  4. Google Cloud Storage:

    • Added proper cleanup of Google Cloud storage clients
    • Added finalizer to ensure cleanup on object destruction
    • Added state tracking to prevent double-close issues
    • Added error handling and logging for cleanup failures

Files Changed

  • openhands/runtime/impl/remote/remote_runtime.py
  • openhands/server/session/session.py
  • openhands/server/session/agent_session.py
  • openhands/storage/google_cloud.py

These changes should help prevent file descriptor and network connection leaks by ensuring proper cleanup of resources in all cases, including error scenarios.


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:8c1fa93-nikolaik   --name openhands-app-8c1fa93   docker.all-hands.dev/all-hands-ai/openhands:8c1fa93

- Add proper cleanup of request sessions in runtime implementations
- Add proper cleanup of WebSocket connections in server
- Add proper cleanup sequence for runtime resources
- Add proper cleanup of Google Cloud storage clients
- Add error handling and logging for cleanup failures
- Add state tracking to prevent double-close issues
@rbren rbren marked this pull request as draft November 13, 2024 22:28
@rbren
Copy link
Collaborator Author

rbren commented Nov 19, 2024

Fixed elsewhere

@rbren rbren closed this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants