Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecation fix: mktemp is deprecated #789

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

keflavich
Copy link
Contributor

@codecov-commenter
Copy link

Codecov Report

Merging #789 (5b3ad61) into master (7452dce) will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #789   +/-   ##
=======================================
  Coverage   77.90%   77.90%           
=======================================
  Files          24       24           
  Lines        5826     5826           
=======================================
  Hits         4539     4539           
  Misses       1287     1287           
Impacted Files Coverage Δ
spectral_cube/dask_spectral_cube.py 85.64% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7452dce...5b3ad61. Read the comment docs.

Copy link
Contributor

@dhomeier dhomeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reproduced the Windows test failure locally on macOS, it seems all other CI jobs were just skipping the test as they are running without zarr.
mkstmp is apparently not quite a drop-in replacement for mktmp, in particular as it specifically protects its file agains concurrent access. But what to_zarr really needs is the directory to put its data and metadata in, so TemporaryDirectory should do the job here – only remains to be clarified just how temporary it is, i.e. when can the cleanup safely be done.

Comment on lines +83 to 86
filenum, filename = tempfile.mkstemp()
with dask.config.set(**cube._scheduler_kwargs):
cube._data.to_zarr(filename)
cube._data = da.from_zarr(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filenum, filename = tempfile.mkstemp()
with dask.config.set(**cube._scheduler_kwargs):
cube._data.to_zarr(filename)
cube._data = da.from_zarr(filename)
tmpdir = tempfile.TemporaryDirectory()
with dask.config.set(**cube._scheduler_kwargs):
cube._data.to_zarr(tmpdir)
cube._data = da.from_zarr(tmpdir)
tmpdir.cleanup()

Without the .cleanup() the test raises a warning over the leftover directory; it's not quite clear to me if the saved data in tmp_dir are never to be accessed again...

@keflavich keflavich marked this pull request as draft January 6, 2022 16:50
@keflavich
Copy link
Contributor Author

Marked as 'draft' and may be a 'do-not-merge' b/c it appears the deprecated mktemp does something qualitatively different from mkstemp that will be hard to reproduce

Copy link
Contributor

@dhomeier dhomeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also understanding mktemp less the more I look at it.
In the current version it is creating a temporary directory which then holds the 0.0.0 and .zarray files, but all of them still exist after explicitly deleting cube, and even after closing the python interpreter! Yet the open_files test does not complain and there is no warning shown. Looks a bit like black magic to me, but maybe it is simply mktemp's implementation not to care about what happens with the directory afterwards, and not giving a warning either...
With just using tempfile.TemporaryDirectory the directory stays around after del cube, but is removed upon terminating the python session, and the open_files test is passing, but raising a
tempfile.py:821: ResourceWarning: Implicitly cleaning up <TemporaryDirectory ...
With the latest suggestion and the custom __del__ method the directory is removed upon del cube, and the test passes without warnings.
So this would seem to solve everything, and there are similar use cases in some Astropy classes or Numpy's DataSource, but I am a bit concerned about the reservations many people seem to have about using __del__, e.g. in https://stackoverflow.com/a/2452895.

Still, if it works (should give this a try on the CI). Alternatively one might simply set the tests to ignore the warning, but properly cleaning up seems the better way, even if we need to recourse on __del__...

Comment on lines +83 to 86
filenum, filename = tempfile.mkstemp()
with dask.config.set(**cube._scheduler_kwargs):
cube._data.to_zarr(filename)
cube._data = da.from_zarr(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filenum, filename = tempfile.mkstemp()
with dask.config.set(**cube._scheduler_kwargs):
cube._data.to_zarr(filename)
cube._data = da.from_zarr(filename)
cube._tmpdir = tempfile.TemporaryDirectory()
with dask.config.set(**cube._scheduler_kwargs):
cube._data.to_zarr(cube._tmpdir.name)
cube._data = da.from_zarr(cube._tmpdir.namefile)

Slightly different option; this will enable the TemporaryDirectory to be cleaned up on deletion of the cube by setting up a custom teardown method for DaskSpectralCubeMixin:

    def __del__(self):
        # Clean up any `tempfile.TemporaryDirectory` created by `save_to_tmp_dir`.
        if hasattr(self, '_tmpdir'):
            self._tmpdir.cleanup()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants