Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change LocalClient to not explicitly store default storage directory #462

Merged
merged 2 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,21 @@
## UNRELEASED

- Fixes typo in `FileCacheMode` where values were being filled by envvar `CLOUPATHLIB_FILE_CACHE_MODE` instead of `CLOUDPATHLIB_FILE_CACHE_MODE`. (PR [#424](https://github.com/drivendataorg/cloudpathlib/pull/424)
- Fix `CloudPath` cleanup via `CloudPath.__del__` when `Client` encounters an exception during initialization and does not create a `file_cache_mode` attribute. (Issue [#372](https://github.com/drivendataorg/cloudpathlib/issues/372), thanks to [@bryanwweber](https://github.com/bryanwweber))
- Fix `CloudPath` cleanup via `CloudPath.__del__` when `Client` encounters an exception during initialization and does not create a `file_cache_mode` attribute. (Issue [#372](https://github.com/drivendataorg/cloudpathlib/issues/372), thanks to [@bryanwweber](https://github.com/bryanwweber))
- Drop support for Python 3.7; pin minimal `boto3` version to Python 3.8+ versions. (PR [#407](https://github.com/drivendataorg/cloudpathlib/pull/407))
- fix: use native `exists()` method in `GSClient`. (PR [#420](https://github.com/drivendataorg/cloudpathlib/pull/420))
- Enhancement: lazy instantiation of default client (PR [#432](https://github.com/drivendataorg/cloudpathlib/issues/432), Issue [#428](https://github.com/drivendataorg/cloudpathlib/issues/428))
- Adds existence check before downloading in `download_to` (Issue [#430](https://github.com/drivendataorg/cloudpathlib/issues/430), PR [#432](https://github.com/drivendataorg/cloudpathlib/pull/432))
- Add env vars `CLOUDPATHLIB_FORCE_OVERWRITE_FROM_CLOUD` and `CLOUDPATHLIB_FORCE_OVERWRITE_TO_CLOUD`. (Issue [#393](https://github.com/drivendataorg/cloudpathlib/issues/393), PR [#437](https://github.com/drivendataorg/cloudpathlib/pull/437))
- Have `glob` for local clients match behavior of cloud versions for parity in testing; make `reset_default_storage_dir` work with default client. (Issue [#414](https://github.com/drivendataorg/cloudpathlib/issues/414), Issue [#415](https://github.com/drivendataorg/cloudpathlib/issues/415), [PR #436](https://github.com/drivendataorg/cloudpathlib/pull/436))
- Fixed `glob` for `cloudpathlib.local.LocalPath` and subclass implementations to match behavior of cloud versions for parity in testing. (Issue [#415](https://github.com/drivendataorg/cloudpathlib/issues/415), [PR #436](https://github.com/drivendataorg/cloudpathlib/pull/436))
- Changed how `cloudpathlib.local.LocalClient` and subclass implementations track the default local storage directory (used to simulate the cloud) used when no local storage directory is explicitly provided. ([PR #436](https://github.com/drivendataorg/cloudpathlib/pull/436), [PR #462](https://github.com/drivendataorg/cloudpathlib/pull/462))
- Changed `LocalClient` so that client instances using the default storage access the default local storage directory through the `get_default_storage_dir` rather than having an explicit reference to the path set at instantiation. This means that calling `get_default_storage_dir` will reset the local storage for all clients using the default local storage, whether the client has already been instantiated or is instantiated after resetting. This fixes unintuitive behavior where `reset_local_storage` did not reset local storage when using the default client. (Issue [#414](https://github.com/drivendataorg/cloudpathlib/issues/414))
- Added a new `local_storage_dir` property to `LocalClient`. This will return the current local storage directory used by that client instance.
by reference through the `get_default_ rather than with an explicit.

## v0.18.1 (2024-02-26)

- Fixed import error due to incompatible `google-cloud-storage` by not using `transfer_manager` if it is not available. ([Issue #408](https://github.com/drivendataorg/cloudpathlib/issues/408), [PR #410](https://github.com/drivendataorg/cloudpathlib/pull/410))
- Fixed import error due to incompatible `google-cloud-storage` by not using `transfer_manager` if it is not available. ([Issue #408](https://github.com/drivendataorg/cloudpathlib/issues/408), [PR #410](https://github.com/drivendataorg/cloudpathlib/pull/410))

Includes all changes from v0.18.0.

Expand Down
43 changes: 31 additions & 12 deletions cloudpathlib/local/localclient.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import shutil
from tempfile import TemporaryDirectory
from time import sleep
from typing import Callable, Dict, Iterable, List, Optional, Tuple, Union
from typing import Callable, ClassVar, Dict, Iterable, List, Optional, Tuple, Union

from ..client import Client
from ..enums import FileCacheMode
Expand All @@ -17,7 +17,12 @@
"""Abstract client for accessing objects the local filesystem. Subclasses are as a monkeypatch
substitutes for normal Client subclasses when writing tests."""

_default_storage_temp_dir = None
# Class-level variable to tracks the default storage directory for this client class
# that is used if a client is instantiated without a directory being explicitly provided
_default_storage_temp_dir: ClassVar[Optional[TemporaryDirectory]] = None

# Instance-level variable that tracks the local storage directory for this client
_local_storage_dir: Optional[Union[str, os.PathLike]]

def __init__(
self,
Expand All @@ -28,10 +33,7 @@
content_type_method: Optional[Callable] = mimetypes.guess_type,
**kwargs,
):
# setup caching and local versions of file. use default temp dir if not provided
if local_storage_dir is None:
local_storage_dir = self.get_default_storage_dir()
self._local_storage_dir = Path(local_storage_dir)
self._local_storage_dir = local_storage_dir

super().__init__(
local_cache_dir=local_cache_dir,
Expand All @@ -41,28 +43,45 @@

@classmethod
def get_default_storage_dir(cls) -> Path:
"""Return the default storage directory for this client class. This is used if a client
is instantiated without a storage directory being explicitly provided. In this usage,
"storage" refers to the local storage that simulates the cloud.
"""
if cls._default_storage_temp_dir is None:
cls._default_storage_temp_dir = TemporaryDirectory()
_temp_dirs_to_clean.append(cls._default_storage_temp_dir)
return Path(cls._default_storage_temp_dir.name)

@classmethod
def reset_default_storage_dir(cls) -> Path:
"""Reset the default storage directly. This tears down and recreates the directory used by
default for this client class when instantiating a client without explicitly providing
a storage directory. In this usage, "storage" refers to the local storage that simulates
the cloud.
"""
cls._default_storage_temp_dir = None

# Also reset storage on default client so it uses the new temp dir
cls.get_default_client()._local_storage_dir = cls.get_default_storage_dir() # type: ignore

return cls.get_default_storage_dir()

@property
def local_storage_dir(self) -> Path:
"""The local directory where files are stored for this client. This storage directory is
the one that simulates the cloud. If no storage directory was provided on instantiating the
client, the default storage directory for this client class is used.
"""
if self._local_storage_dir is None:
# No explicit local storage was provided on instantiating the client.
# Use the default storage directory for this class.
return self.get_default_storage_dir()
return Path(self._local_storage_dir)

Check warning on line 75 in cloudpathlib/local/localclient.py

View check run for this annotation

Codecov / codecov/patch

cloudpathlib/local/localclient.py#L75

Added line #L75 was not covered by tests

def _cloud_path_to_local(self, cloud_path: "LocalPath") -> Path:
return self._local_storage_dir / cloud_path._no_prefix
return self.local_storage_dir / cloud_path._no_prefix

def _local_to_cloud_path(self, local_path: Union[str, os.PathLike]) -> "LocalPath":
local_path = Path(local_path)
cloud_prefix = self._cloud_meta.path_class.cloud_prefix
return self.CloudPath(
f"{cloud_prefix}{PurePosixPath(local_path.relative_to(self._local_storage_dir))}"
f"{cloud_prefix}{PurePosixPath(local_path.relative_to(self.local_storage_dir))}"
)

def _download_file(self, cloud_path: "LocalPath", local_path: Union[str, os.PathLike]) -> Path:
Expand Down
22 changes: 17 additions & 5 deletions tests/test_local.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ def test_default_storage_dir(client_class, monkeypatch):
p1.write_text("hello")
assert p1.exists()
assert p1.read_text() == "hello"

# p2 uses a new client, but the simulated "cloud" should be the same
assert p2.exists()
assert p2.read_text() == "hello"

Expand All @@ -76,20 +78,30 @@ def test_reset_default_storage_dir(client_class, monkeypatch):
cloud_prefix = client_class._cloud_meta.path_class.cloud_prefix

p1 = client_class().CloudPath(f"{cloud_prefix}drive/file.txt")
client_class.reset_default_storage_dir()
p2 = client_class().CloudPath(f"{cloud_prefix}drive/file.txt")

assert not p1.exists()
assert not p2.exists()
Copy link
Member Author

@jayqi jayqi Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test had failed with my new implementation, but I think the old behavior is not desirable. In the old version of this test, we reset the storage directory before instantiating p2 and writing out p1, which means p2 uses a new default local storage but p1 still uses the old default local storage.


p1.write_text("hello")
assert p1.exists()
assert p1.read_text() == "hello"

client_class.reset_default_storage_dir()

# We've reset the default storage directory, so the file should be gone
assert not p1.exists()

# Also should be gone for p2, which uses a new client that is still using default storage dir
p2 = client_class().CloudPath(f"{cloud_prefix}drive/file.txt")
assert not p2.exists()

# clean up
client_class.reset_default_storage_dir()


def test_reset_default_storage_dir_with_default_client():
"""Test that reset_default_storage_dir resets the storage used by all clients that are using
the default storage directory, such as the default client.

Regression test for https://github.com/drivendataorg/cloudpathlib/issues/414
"""
# try default client instantiation
from cloudpathlib.local import LocalS3Path, LocalS3Client

Expand Down
Loading