You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main problem with logging in parallelized operations is simply this: requests are
posted directly to an MLflow service without full information about the state the service at the time the request is ultimately acted on. I propose we resolve this as follows:
Instead of a client posting requests directly to an MLflow service, they are posted
(put!) to a first-in-first-out queue (Julia Channel). Requesting calls will return
immediately, unless the queue is full. In this way, the performance of the parallel
workload is not impacted.
A single Julia Task dispatches requests (take!s) from the end of the queue. Whenever
a request has the possibility of altering the service state (e.g., creating an
experiment), then the dispatcher waits for confirmation that the state change is
complete before dispatching the next request.
I imagine that we can insert the queue (buffer) without breaking the user-facing
interface of MLFlowClient.jl.
I have implemented a POC for this proposal and shared it with two maintainers, and can share with anyone else interested.
The text was updated successfully, but these errors were encountered:
This requirement is a very specific task. Not everyone is using multithreading/multiprocessing to perform this kind of operations. MLFlowClient.jl is mirroring the capabilities the original package is performing. So, in my point of view, we must not implement a buffering solution here. This is something the user will take care of.
In the MLJ.jl context, our library MLJFlow.jl contains two POC workaround using Locks and Channels. It can be seen here JuliaAI/MLJFlow.jl#36.
I'm not 100% convinced. It seems to me any other Julia software that wants to do mlflow logging will run into exactly the same issue if they have parallelism. However, for now I'm happy to shelve the proposal in favour of the specific solutions you have worked out, thank you!
The context of this proposal is this synchronisation issue.
The main problem with logging in parallelized operations is simply this: requests are
posted directly to an MLflow service without full information about the state the service
at the time the request is ultimately acted on. I propose we resolve this as follows:
Instead of a client posting requests directly to an MLflow service, they are posted
(
put!
) to a first-in-first-out queue (JuliaChannel
). Requesting calls will returnimmediately, unless the queue is full. In this way, the performance of the parallel
workload is not impacted.
A single Julia
Task
dispatches requests (take!
s) from the end of the queue. Whenevera request has the possibility of altering the service state (e.g., creating an
experiment), then the dispatcher waits for confirmation that the state change is
complete before dispatching the next request.
I imagine that we can insert the queue (buffer) without breaking the user-facing
interface of MLFlowClient.jl.
I have implemented a POC for this proposal and shared it with two maintainers, and can share with anyone else interested.
The text was updated successfully, but these errors were encountered: