Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix race conditions in EventStream subscriber management #6904

Closed
wants to merge 1 commit into from

Conversation

malhotra5
Copy link
Contributor

@malhotra5 malhotra5 commented Feb 23, 2025

This PR fixes several issues with subscriber management in EventStream that could cause subscribers to continue receiving events after being unsubscribed.

Problem

The current implementation has several race conditions and thread safety issues:

  1. Subscribers can continue receiving events after being unsubscribed
  2. Thread pools are not shut down gracefully
  3. Dictionary modifications during iteration can cause errors
  4. Lack of proper synchronization in subscriber management

Solution

This PR adds several improvements:

  1. Proper thread synchronization using locks
  2. Changes cleanup order to prevent events after unsubscribe
  3. Improves thread pool shutdown behavior
  4. Adds snapshot mechanism for safe iteration
  5. Adds better error handling and logging

Changes

  • Added locks around subscribe/unsubscribe operations
  • Changed cleanup order to remove subscribers before shutting down resources
  • Added non-blocking thread pool shutdown
  • Added snapshot mechanism for safe iteration over subscribers
  • Added better error handling and debug logging
  • Added comprehensive tests

Testing

Added several new test cases:

  • test_subscriber_behavior: Basic subscriber functionality
  • test_subscriber_stress: Stress testing with rapid subscribe/unsubscribe
  • test_subscriber_unsubscribe_bug: Specific bug reproduction
  • test_subscriber_unsubscribe_concurrent: Concurrent operations
  • test_subscriber_race_condition: Race condition testing

All tests pass and demonstrate proper handling of:

  • Rapid subscribe/unsubscribe operations
  • Concurrent event processing
  • Thread pool lifecycle
  • Race conditions
  • Edge cases and error conditions

Notes

This is a draft PR to get early feedback on the approach. The changes are significant but necessary to fix the underlying issues.


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:612c6ee-nikolaik   --name openhands-app-612c6ee   docker.all-hands.dev/all-hands-ai/openhands:612c6ee

This commit fixes several issues with subscriber management in EventStream:

1. Adds proper thread synchronization to prevent race conditions
2. Changes cleanup order to prevent events after unsubscribe
3. Improves thread pool shutdown behavior
4. Adds snapshot mechanism for safe iteration
5. Adds better error handling and logging

The changes ensure that:
- Subscribers are properly unsubscribed and don't receive events after
- Thread pools are shut down gracefully
- Race conditions are handled correctly
- Edge cases are handled properly

Added tests:
- test_subscriber_behavior: Basic subscriber functionality
- test_subscriber_stress: Stress testing with rapid subscribe/unsubscribe
- test_subscriber_unsubscribe_bug: Specific bug reproduction
- test_subscriber_unsubscribe_concurrent: Concurrent operations
- test_subscriber_race_condition: Race condition testing
@malhotra5 malhotra5 added bug Something isn't working needs review labels Feb 23, 2025 — with OpenHands AI
@malhotra5 malhotra5 removed bug Something isn't working needs review labels Feb 23, 2025
@malhotra5 malhotra5 closed this Feb 24, 2025
@malhotra5 malhotra5 deleted the fix/subscriber-race-conditions branch February 24, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants