[Q&A] Improve turnaround time for quick computations #2389
Replies: 5 comments 4 replies
-
Thank for raise the issue, we will look into this. |
Beta Was this translation helpful? Give feedback.
-
@nickgautier thanks for the interest. For a nvflare system, each site will have a monitoring process, let's call it "client parent" (CP) and server parent (SP) Right now when NVFlare starts a job. We will start a client job process at each site and a server job process on the server side. And the corresponding communication mechanism needs to be setup on each of those processes as well, so that would take some time. From your use case, looks like you just want to calculate something quickly. This way we can avoid the overhead of starting additional processes. Note that by doing this we will make the system more vulnerable as the job execution code is not isolated, it might crash this site if the job code has error. Another thing we can check is the setup overhead for each job, that will require more details of your workload, Is your job work like this:
|
Beta Was this translation helpful? Give feedback.
-
@yhwen @yanchengnv maybe you can take a look when you get a chance |
Beta Was this translation helpful? Give feedback.
-
close due to inactivity, feel free to re-open |
Beta Was this translation helpful? Give feedback.
-
The job submit/start/end lifecycle has overhead, as you observed. Once a job is started, it'll keep running until it's done, and won't have much overhead. So to speed up your overall throughput, please consider using one long-running job for all your computation needs. Flare 2.5 (will be released in Sept) has a feature that lets you send custom commands to a running job. This way, you can just start one job and let it run "forever" and then use custom commands to ask it to do whatever computation you need, and the computation will be performed instantly. |
Beta Was this translation helpful? Give feedback.
-
Python version (
python3 -V
)3.11
NVFlare version (
python3 -m pip list | grep "nvflare"
)2.4.0
NVFlare branch (if running examples, please use the branch that corresponds to the NVFlare version,
git branch
)No response
Operating system
Docker (slimed down image)
Have you successfully run any of the following examples?
Please describe your question
Hello!
We've noticed that under optimal circumstances:
A full computation, from submit to results takes about ~10 seconds.
We have use cases that involve statistical operation that are very quick. Consequently our users have an expectation of performance, typically below 1 second computation times.
We would be keen for any insight into optimizing for the above use case.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions