Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding lock to control experiment creation during multiprocessing #36

Closed

Conversation

pebeto
Copy link
Member

@pebeto pebeto commented Mar 7, 2024

This PR adds tests required by #25, but also solves the issue presented in JuliaAI/MLFlowClient.jl#40.

Using julia --threads 5, each thread will continue its own processing until reaches the log_evaluation function. In that step, a lock (mutex) will turn the entire process into a sync one ensuring thread-safeness.

Feature diagram:

multiproc_mljflow

@pebeto pebeto linked an issue Mar 7, 2024 that may be closed by this pull request
@pebeto pebeto requested a review from ablaom March 7, 2024 19:10
@pebeto pebeto self-assigned this Mar 7, 2024
@pebeto pebeto added the enhancement New feature or request label Mar 7, 2024
@ablaom
Copy link
Member

ablaom commented Apr 3, 2024

Since this PR introduces locks, I am wondering what the impact might be on the parallelization of the machine learning tasks it is tracking. Is it possible that by stealing control, MLJFlow.jl essentially prevents otherwise parallel threads from actually executing simultaneously?

Be great to see a benchmark showing this is not likely (e.g., using this MLJTuning branch. Can provide guidance with that.

@pebeto pebeto force-pushed the 25-implementing-mutex-control-for-experiment-creation branch from 517b505 to 70c458b Compare April 9, 2024 17:20
@pebeto
Copy link
Member Author

pebeto commented May 18, 2024

In this branch, I implemented a Channel approach to ensure a queue usage. However, it is still buggy without a multiprocessing step being done (i.e. evaluating without acceleration).

@ablaom
Copy link
Member

ablaom commented May 19, 2024

In this branch, I implemented a Channel approach to ensure a queue usage. However, it is still buggy without a multiprocessing step being done (i.e. evaluating without acceleration).

Great to see some progress there. Does this exploration mean the current PR indeed "locks out" the parallelization? I still wonder if it's more natural to implement the buffering at MLFlowClient.jl, as suggested in my POC. Sorry I haven't had time to flush that out to full blown PR.

@pebeto
Copy link
Member Author

pebeto commented May 20, 2024

I still wonder if it's more natural to implement the buffering at MLFlowClient.jl

I don't agree with adding this thread-safe "tweaks" into MLFlowClient.jl. That library should mirror the implementations in other languages (which don't support concurrency completely). That's why I preferred to apply this in MLJFlow.jl instead, because this issue belongs to MLJ multi-threading/multi-processing operations.


Benchmarks

Channel-oriented approach:

# CPUThreads
 Range (min … max):  2.988 s …   3.105 s  ┊ GC (min … max): 0.13% … 0.19%
 Time  (median):     3.046 s              ┊ GC (median):    0.16%
 Time  (mean ± σ):   3.046 s ± 82.388 ms  ┊ GC (mean ± σ):  0.16% ± 0.04%

                                                             
  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  0.5 ns      Histogram: log(frequency) by time        8 ns <


# CPUProcesses
 Range (min … max):  3.038 s …   3.171 s  ┊ GC (min … max): 0.12% … 0.13%
 Time  (median):     3.105 s              ┊ GC (median):    0.13%
 Time  (mean ± σ):   3.105 s ± 94.311 ms  ┊ GC (mean ± σ):  0.13% ± 0.01%

                                                             
  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  0.5 ns      Histogram: log(frequency) by time        8 ns <

Mutex-oriented approach:

# CPUThreads
 Range (min … max):  3.013 s …    3.194 s  ┊ GC (min … max): 0.10% … 0.67%
 Time  (median):     3.103 s               ┊ GC (median):    0.40%
 Time  (mean ± σ):   3.103 s ± 127.769 ms  ┊ GC (mean ± σ):  0.40% ± 0.40%

                                                              
  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  0.5 ns       Histogram: log(frequency) by time        8 ns <


# CPUProcesses
 Range (min … max):  3.036 s …   3.155 s  ┊ GC (min … max): 0.11% … 0.15%
 Time  (median):     3.096 s              ┊ GC (median):    0.13%
 Time  (mean ± σ):   3.096 s ± 84.371 ms  ┊ GC (mean ± σ):  0.13% ± 0.03%

                                                             
  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  0.5 ns      Histogram: log(frequency) by time        8 ns <

@ablaom

@pebeto pebeto force-pushed the 25-implementing-mutex-control-for-experiment-creation branch from 70c458b to 4ce8725 Compare May 20, 2024 06:37
@pebeto
Copy link
Member Author

pebeto commented May 21, 2024

Closing in favor of #41

@pebeto pebeto closed this May 21, 2024
@pebeto pebeto deleted the 25-implementing-mutex-control-for-experiment-creation branch May 21, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for CPUThreads and CPUProcesses
2 participants