You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Active jobs app, for all users will hang when LSF runs thousands of jobs, and the active history in LSF is kept for days instead of hours. CLEAN_PERIOD in LSF configuration controls how much data bjobs retrieve. CLEAN_PERIOD is usually a day, but when increased to 3 days , it caused a forever hang.
I see that the issue is because of bjobs arguments in lib/ood_core/job/adapters/lsf/batch.rb :
def get_jobs_for_user(user) args = %W( -u #{user} -a -w -W ) parse_bjobs_output(call("bjobs", *args)) end
bjobs -u all -a -w -W is very resource intensive when thousands of jobs are scheduled, and almost can take forever to return.
I had to make the following change ( remove -a ) to make it respond:
As far as I remember, it cannot be used as an environment variable.
CLEAN_PERIOD is part of lsb.params configuration file, and is usually set as part of scheduler policies.
Active jobs app, for all users will hang when LSF runs thousands of jobs, and the active history in LSF is kept for days instead of hours. CLEAN_PERIOD in LSF configuration controls how much data bjobs retrieve. CLEAN_PERIOD is usually a day, but when increased to 3 days , it caused a forever hang.
I see that the issue is because of bjobs arguments in lib/ood_core/job/adapters/lsf/batch.rb :
def get_jobs_for_user(user) args = %W( -u #{user} -a -w -W ) parse_bjobs_output(call("bjobs", *args)) end
bjobs -u all -a -w -W is very resource intensive when thousands of jobs are scheduled, and almost can take forever to return.
I had to make the following change ( remove -a ) to make it respond:
def get_jobs_for_user(user) args = %W( -u #{user} -w -W ) parse_bjobs_output(call("bjobs", *args)) end
It would be good to keep the above configurable, instead of making the change in code.
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: