Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nightqa multiprocessing and add new options for skipping some steps #2418

Merged
merged 3 commits into from
Dec 3, 2024

Conversation

akremin
Copy link
Member

@akremin akremin commented Nov 25, 2024

Overview

This resolves Issues #2401 and #2412. It skips over steps where the output file already exists (except for HTML), unless --recompute is given. Previously it raised an error if the file existed and --recompute wasn't set. It also adds two new options, --skip-cal-steps and --skip-dark-steps. If --skip-cal-steps is set then none of the calibration steps are run, even if provided in --steps="list,of,steps". If --skip-dark-steps is set then none of the steps involving calibration darks are performed, even if specified in --steps="list,of,steps". These are both useful when rerunning nightqa after "cleaning up" science tiles on a night in daily operations where the calibrations haven't changed. The morning dark calibration step can be particularly long since it actually runs preproc on the exposure, both of these new commands avoid the need to do that.

This PR also changes it so that the HTML file is overwritten, even if --recompute is not set. This is a slight departure from normal behavior, but I think it's what we want since we want new steps to be reflected in the output HTML.

Finally, this also makes updates to the multiprocessing and preproc steps. For multiprocessing, it uses with multiprocessing.Pool rather than defining a variable as the Pool and then do with pool_variable. This may not have any practical impacts, but felt more robust. The temporary preproc files are also removed. From testing, this appears to be the cause of the job hanging when multiple desi_night_qa runs are done in succession.

Testing

I've run tests in /global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29
Logs are in /global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29/logs and nightqa outputs in /global/cfs/cdirs/desi/users/kremin/PRs/nightqa_v29/nightqa.

SUCCESS: Test of normal operation: night 20241111

export SPECPROD=daily; export NIGHT=20241111; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}

This ran in ~6.5 minutes, which is typical, and produced all of the outputs.

SUCCESS: Test that skips existing steps if --recompute not set: night 20241111

export SPECPROD=daily; export NIGHT=20241111; desi_night_qa --nproc=32 -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}

No files were overwritten except for the nightqa.html and nightqa.css, and the code didn't exit with an error either.

SUCCESS: Test of --recompute operation: night 20241112

Ran
export SPECPROD=daily; export NIGHT=20241112; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}
Then reran
export SPECPROD=daily; export NIGHT=20241112; desi_night_qa --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT}

All outputs were overwritten as expected. The code took roughly the same amount of time in each circumstance ~7.5 minutes.

SUCCESS: Test of --skip-cal-test operation: night 20241113

export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --skip-cal-steps --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}.log

Produced the expected outputs and didn't produce the cal outputs.

SUCCESS: Test of --skip-dark-steps operation: night 20241113

export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --skip-dark-steps --nproc=32 --recompute -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}_second.log

Reproduced the non-cal outputs, created the non-dark calibration outputs for the first time, and didn't produce the dark outputs.

SUCCESS: Test of --compute-missing-only operation: night 20241113

export SPECPROD=daily; export NIGHT=20241113; desi_night_qa --compute-missing-only --nproc=32 -p $SPECPROD -n $NIGHT -o ./nightqa/${NIGHT} &>> ./logs/nightqa-${SPECPROD}-${NIGHT}_third.log

Created the dark outputs, updated the HTML file, and nothing else.

@akremin akremin requested a review from sbailey November 25, 2024 21:45
@akremin akremin requested a review from araichoor November 25, 2024 21:49
@coveralls
Copy link

Coverage Status

coverage: 30.144% (-0.001%) from 30.145%
when pulling 9b6cdb0 on nightqa_v29
into 08d4acd on main.

@coveralls
Copy link

Coverage Status

coverage: 30.138% (-0.007%) from 30.145%
when pulling 9b6cdb0 on nightqa_v29
into 08d4acd on main.

Copy link
Contributor

@sbailey sbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, thanks.

Please clarify: Are the --skip-dark-steps and --skip-cal-steps just convenience options instead of having to figure out which of a long list of --steps need to be dropped to achieve the same thing? Normally it is a red flag when options appear to be competing with each other. Thanks for documenting that --skip-BLAH-steps overrides --steps, but is there anything deeper going on here?

@akremin
Copy link
Member Author

akremin commented Nov 27, 2024

Please clarify: Are the --skip-dark-steps and --skip-cal-steps just convenience options instead of having to figure out which of a long list of --steps need to be dropped to achieve the same thing? Normally it is a red flag when options appear to be competing with each other. Thanks for documenting that --skip-BLAH-steps overrides --steps, but is there anything deeper going on here?

They are convenience options as you describe. The mechanism I used to implement them drops specific steps from the list of --steps. This saves the user from having to remember what all the steps are and also saves them from having to type out a long string of text into the arguments, since the default for --steps is a list of all steps.

I agree with you that conflicts are bad, though in this case the flags are very explicit. If someone defines their own list of --steps that includes e.g. morningdark but then also gives --skip-dark-steps, I would assume they copy-pasted a steps string from elsewhere and don't actually want to process the morning dark. That is why I chose to resolve the conflicts this way.

Please let me know if you would like further explanation in script logging or comments in the code, or if you just wanted clarification here.

@sbailey
Copy link
Contributor

sbailey commented Dec 2, 2024

Thanks for the additional explanation about the options. That approach is fine. OK with me to merge. Any objections, @araichoor ?

@araichoor
Copy link
Contributor

sorry for not providing earlier feedback; and also for the maybe complicated original code!

from my small test (described below), I think it looks good, i.e. ok to merge.

the lists of steps_tbd is what I expect -- noting that the --recompute option is accounted for further in code, when checking/calling each step.

for book-keeping, here s my small test:

  • I ve copied these lines in a small function, which returns the steps_tbd:

    desispec/bin/desi_night_qa

    Lines 121 to 157 in 9b6cdb0

    # AR steps to be done
    if not args.compute_missing_only:
    steps_tbd = args.steps.split(",")
    else:
    log.info("args.compute_missing_only set: will override args.steps={}".format(args.steps))
    steps_tbd = []
    for step in steps_all:
    ## recompute html since recomputing other steps
    if not os.path.isfile(outfns[step]) or step == 'html':
    steps_tbd.append(step)
    log.info(f"add {step} to steps_tbd, as args.compute_missing_only "
    + f"set and no {outfns[step]} present")
    if args.skip_dark_steps or args.skip_cal_steps:
    for step in ['dark', 'morningdark', 'badcol']:
    if step in steps_tbd:
    steps_tbd.remove(step)
    if args.skip_cal_steps:
    for step in ['ctedet', 'ctedetrowbyrow']:
    if step in steps_tbd:
    steps_tbd.remove(step)
    if len(steps_tbd) == 0:
    log.info("no steps to be done")
    else:
    log.info("steps to be done: {}".format(",".join(steps_tbd)))
    # AR existing files?
    for step in steps_tbd:
    fn = outfns[step]
    log.info("will create {}".format(fn))
    if os.path.isfile(fn):
    if args.recompute:
    log.warning("\texisting {} will be overwritten".format(fn))
    elif step == 'html':
    log.warning("\texisting {} will be overwritten even though args.recompute = False".format(fn))
    else:
    log.warning(f"\t{fn} already exists and args.recompute = False,"
    + f" so skipping this step")
  • I ve set steps_all = list(get_nightqa_outfns("", 0).keys()) and night = 20241201
  • then I check for all combinations of skip_dark_steps, skip_cal_steps, compute_missing_only, recompute the returned steps_tbd.

here s if I run it with outdir = "/global/cfs/cdirs/desi/spectro/redux/daily/nightqa/20241201", so with all files already existing:

# skip_dark_steps skip_cal_steps compute_missing_only recompute steps_tbd
True	True	True	True	html
True	True	True	False	html
True	True	False	True	html,sframesky,tileqa,skyzfiber,petalnz
True	True	False	False	html,sframesky,tileqa,skyzfiber,petalnz
True	False	True	True	html
True	False	True	False	html
True	False	False	True	html,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
True	False	False	False	html,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
False	True	True	True	html
False	True	True	False	html
False	True	False	True	html,sframesky,tileqa,skyzfiber,petalnz
False	True	False	False	html,sframesky,tileqa,skyzfiber,petalnz
False	False	True	True	html
False	False	True	False	html
False	False	False	True	html,dark,morningdark,badcol,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
False	False	False	False	html,dark,morningdark,badcol,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz

and here if I run with outdir = "./", ie with no existing files:

# skip_dark_steps skip_cal_steps compute_missing_only recompute steps_tbd
True	True	True	True	html,sframesky,tileqa,skyzfiber,petalnz
True	True	True	False	html,sframesky,tileqa,skyzfiber,petalnz
True	True	False	True	html,sframesky,tileqa,skyzfiber,petalnz
True	True	False	False	html,sframesky,tileqa,skyzfiber,petalnz
True	False	True	True	html,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
True	False	True	False	html,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
True	False	False	True	html,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
True	False	False	False	html,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
False	True	True	True	html,sframesky,tileqa,skyzfiber,petalnz
False	True	True	False	html,sframesky,tileqa,skyzfiber,petalnz
False	True	False	True	html,sframesky,tileqa,skyzfiber,petalnz
False	True	False	False	html,sframesky,tileqa,skyzfiber,petalnz
False	False	True	True	html,dark,morningdark,badcol,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
False	False	True	False	html,dark,morningdark,badcol,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
False	False	False	True	html,dark,morningdark,badcol,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz
False	False	False	False	html,dark,morningdark,badcol,ctedet,ctedetrowbyrow,sframesky,tileqa,skyzfiber,petalnz

@sbailey sbailey merged commit 3f14fe5 into main Dec 3, 2024
26 checks passed
@sbailey sbailey deleted the nightqa_v29 branch December 3, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants