Description
Active jobs app, for all users will hang when LSF runs thousands of jobs, and the active history in LSF is kept for days instead of hours. CLEAN_PERIOD in LSF configuration controls how much data bjobs retrieve. CLEAN_PERIOD is usually a day, but when increased to 3 days , it caused a forever hang.
I see that the issue is because of bjobs arguments in lib/ood_core/job/adapters/lsf/batch.rb :
def get_jobs_for_user(user) args = %W( -u #{user} -a -w -W ) parse_bjobs_output(call("bjobs", *args)) end
bjobs -u all -a -w -W is very resource intensive when thousands of jobs are scheduled, and almost can take forever to return.
I had to make the following change ( remove -a ) to make it respond:
def get_jobs_for_user(user) args = %W( -u #{user} -w -W ) parse_bjobs_output(call("bjobs", *args)) end
It would be good to keep the above configurable, instead of making the change in code.
┆Issue is synchronized with this Asana task by Unito