Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP JP-3610: Memory usage with Detector1 pipeline #8588

Closed
wants to merge 87 commits into from

Conversation

penaguerrero
Copy link
Contributor

@penaguerrero penaguerrero commented Jun 21, 2024

Resolves JP-3610

Closes #8454

This PR addresses running time and memory usage substantial improvement for the Detector 1 pipeline. It is a workaround of the detailed investigation required in the datamodels repo to permanently fix the issue.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

  • added entry in CHANGES.rst within the relevant release section
  • updated or added relevant tests
  • updated relevant documentation
  • added relevant milestone
  • added relevant label(s)
  • ran regression tests, post a link to the Jenkins job below.
    How to run regression tests on a PR
  • All comments are resolved
  • Make sure the JIRA ticket is resolved properly

Copy link
Collaborator

@melanieclarke melanieclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are some unrelated changes to emicorr in here.

Also, it would be less disruptive to the code, if we can still call your function in a with statement. Looking closer, I think that might be possible as written, since the datamodel already has __enter__ and __exit__ definitions. Can we try that?

model : datamodel
The datamodel object
"""
if not isinstance(input, str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe check for a JwstDataModel here, instead of just not-a-string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's a good idea, will implement this and the "with"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from stdatamodels.jwst.datamodels import dqflags
import numpy as np


def use_datamodel(input):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about updating JwstStep._datamodels_open and using it (or Step.open_model) instead of adding this utility function? That would have the added benefit of applying a similar fix (avoiding unnecessary model clones triggered by datamodels.open(Model)) to the calls that stpipe makes to open models to determine references.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other benefit to updating and using Step.open_model would be to allow the opening code to check if the step is part of a pipeline (where Step.parent will return the pipeline). This might be too "fancy" (and harder to make sense of) but the general idea would be to replace the following (found in many of the steps)

with datamodels.open(input) as input:

with

with self.open_model(input) as input:

And implement _datamodels_open in a way that:

  • when Step.parent is none, mimic the old behavior (create a clone when a model is provided)
  • when Step.parent is not None only open the model if a string (or path) is provided
  • optionally provide a step class (like RampModel) to use when opening the model (or to create a clone when the input is not an instance of the class)
  • if a model was not opened (or a clone made) return the model with a nullcontext so that the input is not closed when the step finishes

Dropping the old behavior (cloning the input) is worth considering because clones share data arrays (which are often updated in steps).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented your suggestions, thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looking at _datamodels_open for this PR:
https://github.com/spacetelescope/jwst/pull/8588/files#diff-9a8d6de889fab1db67280357ed291190cbcf4fa320977c44c922db8cc78976a6R28-R33
the items described in the bullet points above are missing.

  • when Step.parent is none, mimic the old behavior (create a clone when a model is provided)
  • when Step.parent is not None only open the model if a string (or path) is provided

I don't see any check to Step.parent. This means every step now. modifies the input when provided a DataModel (even when run outside the pipeline).

  • optionally provide a step class (like RampModel) to use when opening the model (or to create a clone when the input is not an instance of the class)

I don't see any option for a model class but I think given the limited use of this behavior (forcing a model type) it might be cleaner to just have this in detector 1.

  • if a model was not opened (or a clone made) return the model with a nullcontext so that the input is not closed when the step finishes

This one is quite important as most steps use a with statement. If a nullcontext isn't used the steps will now close the input model when they finish. Since this is the same as the output model the next step in the pipeline will receive a closed model. This can cause things to crash for different model/step combinations as the full contents of a model might not be read when the file is opened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function that does this is copy_datamodel(input, step_parent), it checks if self.parent is None and creates a copy, otherwise it returns the input

Copy link
Contributor Author

@penaguerrero penaguerrero Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words, the procedure now is to use use_datamodel(input, model_class) and/or make sure the model is the desired class, then make a copy or not based on whether the step is part of a pipeline or not with copy_datamodel(input, step_parent)

result = charge_migration.charge_migration(input_model, signal_threshold)
result.meta.cal_step.charge_migration = 'COMPLETE'

input_model.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When input is a model, this will close the input file. This can cause issues for certain files (if the input file is asdf and lazy loaded).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is addressed by using the "with" suggested by Melanie - already in the works

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I see that some of the steps are updated to use with. When the input is not cloned the __exit__ of the with statement will close the file (if the model is the only one that has file_references to the input file). These references are used to track when the file can be safely closed. When a model is cloned it increments the reference to the file so that both the clone and original model need to be freed before the file is closed. This cloning and file reference counting is what allowed these steps to use a with context without closing the file too early. If we want to remove the cloning and keep the file references up-to-date we'll either have to remove the with context usage or provide a way for a datamodel to be used in a with statement without requiring a clone. One option here might be to use a nullcontext when the model is not cloned.

@@ -28,19 +29,21 @@ class ChargeMigrationStep(Step):
def process(self, input):

# Open the input data model
with datamodels.RampModel(input) as input_model:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why this was using RampModel(input) instead of datamodels.open(input)? Are there times when this input might not already be a RampModel?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several other steps modified in this PR that are similar (and the same questions apply).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use a RampModel up to the ramp step, but if we use the use_datamodel function then it does not matter anymore since it uses datamodels.open()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@braingram The calwebb_detector1 pipeline, as well as its individual steps, must have original 4-D data ramps to work with, which is a RampModel. Forcing the calwebb_detector1 pipeline module to load its original input into a RampModel (as opposed to using a generic open and letting datamodels decide the appropriate model based on the DATAMODL keyword) is necessary, because the "uncal" files created by level-1b processing are in the form of a Level1bModel. So we have to force it into a RampModel. Whether it's also necessary to do that for individual steps, like charge_migration, is debatable. In normal flow, as part of the calwebb_detector1 pipeline, the input would already be in the form of a RampModel, thanks to calwebb_detector1 focing that. But there's also the possibility of someone trying to run an individual step on an "uncal" file, outside of the pipeline. Hence it could be argued that again forcing it RampModel is safest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So the steps (which may receive a Level1bModel or RampModel) could check the model input type (isinstance(model, RampModel)) and only convert it to a RampModel if it's not already one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what the function does now

Copy link

codecov bot commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 58.58586% with 123 lines in your changes missing coverage. Please review.

Project coverage is 61.91%. Comparing base (eab0532) to head (efa083c).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
jwst/pipeline/calwebb_detector1.py 6.97% 40 Missing ⚠️
jwst/rscd/rscd_step.py 13.63% 19 Missing ⚠️
jwst/reset/reset_step.py 15.00% 17 Missing ⚠️
jwst/persistence/persistence_step.py 23.07% 10 Missing ⚠️
jwst/emicorr/emicorr_step.py 25.00% 9 Missing ⚠️
jwst/ipc/ipc_step.py 33.33% 6 Missing ⚠️
jwst/gain_scale/gain_scale_step.py 66.66% 4 Missing ⚠️
jwst/saturation/saturation_step.py 73.33% 4 Missing ⚠️
jwst/dark_current/dark_current_step.py 70.00% 3 Missing ⚠️
jwst/linearity/linearity_step.py 70.00% 3 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8588      +/-   ##
==========================================
+ Coverage   61.86%   61.91%   +0.04%     
==========================================
  Files         377      377              
  Lines       38911    39009      +98     
==========================================
+ Hits        24071    24151      +80     
- Misses      14840    14858      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@braingram
Copy link
Collaborator

braingram commented Jun 21, 2024

The steps updated in this PR will now modify their inputs (rather than creating new models) when called with a DataModel. I am overall in favor of this type of change but want to bring up the point because previously I was told that the convention for the pipeline is to not modify the input.

For example, if I run (for example DQInitStep) on a model with the current main branch:

>> model = datamodels.open("some_file.fits")
>> step = DQInitStep()
>> result = step(model)
>> result is model
False
>> model.meta.cal_step.dq_init
None

Then result is not model and model.meta.cal_step is None. With this PR I attempted to run the example with this PR I ran into an error (more on that below) but I expect it should change the result to instead:

>> result is model
True
>> model.meta.cal_step.dq_init
"COMPLETE"

The error I ran into was AttributeError: No attribute 'dq' I think that's because I provided it a FGS GUIDER1 file (the first random file I grabbed). I did not encounter an error on main.

Is there any mention in the documentation about if steps modify their inputs?

@tapastro tapastro removed this from the Build 11.1 milestone Sep 20, 2024
@penaguerrero penaguerrero changed the title JP-3610: Memory usage with Detector1 pipeline WIP JP-3610: Memory usage with Detector1 pipeline Sep 20, 2024
CHANGES.rst Outdated Show resolved Hide resolved
@zacharyburnett zacharyburnett removed their request for review September 23, 2024 19:30
@tapastro
Copy link
Contributor

We should aim to keep the input unchanged after it is provided to a step or pipeline; any improvements to memory usage that do not violate this principle should be split off this PR, but this PR will be closed.

@tapastro tapastro closed this Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory usage with Detector1 pipeline
8 participants