Skip to content

LSF bjobs for all users hangs #745

Open
@vallerul

Description

@vallerul

Active jobs app, for all users will hang when LSF runs thousands of jobs, and the active history in LSF is kept for days instead of hours. CLEAN_PERIOD in LSF configuration controls how much data bjobs retrieve. CLEAN_PERIOD is usually a day, but when increased to 3 days , it caused a forever hang.
I see that the issue is because of bjobs arguments in lib/ood_core/job/adapters/lsf/batch.rb :

def get_jobs_for_user(user) args = %W( -u #{user} -a -w -W ) parse_bjobs_output(call("bjobs", *args)) end
bjobs -u all -a -w -W is very resource intensive when thousands of jobs are scheduled, and almost can take forever to return.

I had to make the following change ( remove -a ) to make it respond:

def get_jobs_for_user(user) args = %W( -u #{user} -w -W ) parse_bjobs_output(call("bjobs", *args)) end

It would be good to keep the above configurable, instead of making the change in code.

┆Issue is synchronized with this Asana task by Unito

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions