You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A performance issue mainly. When approaching the end of the computation, some work assigned to certain cores will take much longer than other ones.
I believe the reason is that the total number of jobs are directly divided and assigned to different cores without any sync points in the middle. But each job will have different lengths, so you may have one cores working much longer than other cores, thus lagging the whole process. LIke you have 1000 jobs and 10 cores, so you have 100 jobs each core running at the same time.
Instead, you can have a batch number, and parallel and force sync each batch, kinda like the batch in deep learning. For example, you have a batch size of 200, you divide 1000 by 200 and you got 5 batch. You then parallel each batch with 10 cores. This should significantly reduce the impact of imbalanced job length.
I don't know how the parallel computing is implemented in r5, but this is indeed harmful for performance.
The text was updated successfully, but these errors were encountered:
Hi @luyuliu , the parallel processing is entirely managed from the Java side upstream in R5. I would suggest we migrate this issue to the R5 repository.
A performance issue mainly. When approaching the end of the computation, some work assigned to certain cores will take much longer than other ones.
I believe the reason is that the total number of jobs are directly divided and assigned to different cores without any sync points in the middle. But each job will have different lengths, so you may have one cores working much longer than other cores, thus lagging the whole process. LIke you have 1000 jobs and 10 cores, so you have 100 jobs each core running at the same time.
Instead, you can have a batch number, and parallel and force sync each batch, kinda like the batch in deep learning. For example, you have a batch size of 200, you divide 1000 by 200 and you got 5 batch. You then parallel each batch with 10 cores. This should significantly reduce the impact of imbalanced job length.
I don't know how the parallel computing is implemented in r5, but this is indeed harmful for performance.
The text was updated successfully, but these errors were encountered: