Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocess online deployment #331

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

wbenoit26
Copy link
Contributor

This PR adds multiprocessing to our online deployment function. The main constraints when setting this up were:

  1. Multiprocessing with CUDA requires new processes to be spawned, rather than forked from an existing process.
  2. Passing any kind of data between spawned processes introduced a lot of latency, O(seconds)

So, with this setup, everything on GPU happens in the same process. The flow of information goes as follows:

Process 0: Main Aframe loop

  1. Detects event
  2. Sends event to event processing queue (P1)
  3. Runs AMPLFI, obtaining samples for each parameter
  4. Puts samples in shared memory and sends event time to AMPLFI queue to produce skymap and submit PE (P2)
    Note: using a shared memory object because passing a large tensor between processes introduces latency

Process 1: Event processing

  1. Receives event
  2. Sends event to p_astro queue (P3)
  3. Submits event to GraceDB, getting graceid
  4. Sends graceid to AMPLFI and p_astro queues

Process 2: Skymap creation

  1. Receives event time
  2. Gets samples from shared memory (note: this is a little slow, like 0.2 s) and produces PE data products
  3. Receives graceid
  4. Submits PE products
    Note: technically, the graceid could be received first, which is why there's a check for that. Should be a way to do this more cleanly, but I think this is fine for now.

Process 3: p_astro calculation

  1. Receives event
  2. Calculates p_astro
  3. Receives graceid
  4. Submits p_astro

@EthanMarx There's a more to do on this, like writing the buffers to disk and probably better logging with all of the different processes happening, but it would be good to get some eyes on it now.

@wbenoit26
Copy link
Contributor Author

@EthanMarx I think this is ready for a review. I'm running some benchmarking on what's currently in main right now for comparison with the measurements I took of this code. There's only been a handful of events, but so far, the time savings are as expected: the time from detection from sky-map is cut down by a little more than 1.5 s, and of course we don't have to wait for everything to be submitted before continuing the search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant