Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider resample allowed_memory and associated environment variable #8775

Open
stscijgbot-jp opened this issue Sep 11, 2024 · 0 comments
Open

Comments

@stscijgbot-jp
Copy link
Collaborator

Issue JP-3742 was created on JIRA by Ned Molter:

The utility of the allowed_memory parameter to outlier detection and resample is not clear, and the existence of the parameter may lead to confusion.  The same is true of the environment variable DMODEL_ALLOWED_MEMORY, which serves as another way of setting that parameter.

The parameter is only used in one place, inside ResampleData, where the available memory is computed as 

// (psutil.virtual_memory().available + psutil.swap_memory().total) * allowed_memory```
That value is compared with the expected array shape of the output wcs, and if the array would be too large, a custom `OutputTooLargeError` is raised.

There are multiple issues here, exposed in part by a conversation with Jesse Doggett about how this might be used in ops
 * It was realized that this OutputTooLargeError is not currently one of the metrics ops uses to check if a step ran out of memoryThe only things checked are: exit status 137 from strun (killed by sigkill); “ValueError: assignment destination is read-only”; andMemoryErr”.  This OutputTooLargeError has apparently never been encountered.
 * In operations, the available memory read by `psutil` is not very helpfulEach machine has 10 job slots, but the available memory is (very likely but not yet tested) reading the total for the whole machineOperations sets the available_memory equal to 0.7 (using the DMODEL_ALLOWED_MEMORY) environment variable, which is probably not a good idea for a machine running 10 jobs.
 * Checking the `output_wcs.array_shape` is most likely insufficient to know the actual memory usage of the stepFor example, testing has shown that when resample is called via outlier detection, the median computation is often more memory-hungryThe pixmap calculation may also take lots of memory, as computing it involves lots of array copying and its final size is 2x the input size of float64sAnd the context array has the same height and width as the output but with a depth that is the number of input images divided by 32 (but see also JP-3707).

Soliciting additional input from [Jesse Doggett](https://jira.stsci.edu/secure/ViewProfile.jspa?name=doggett) and anyone else on team coffee who might have opinions or could provide clarity on what to do here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant