Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-fetch should work without caching as well #647

Open
dmpetrov opened this issue Dec 1, 2024 · 1 comment
Open

pre-fetch should work without caching as well #647

dmpetrov opened this issue Dec 1, 2024 · 1 comment
Assignees

Comments

@dmpetrov
Copy link
Member

dmpetrov commented Dec 1, 2024

pre-fetch and caching should be independent settings.

See discussion in #635

@skshetry
Copy link
Member

I have a PR in #730 to use a separate directory for caching (by default it uses .datachain/tmp directory that we already have). That directory gets cleaned up after the prefetching is completed.

However, for pytorch, auto-cleanup looks tricky. We cannot do so with PytorchDataset instance, so either we'd need to provide a custom DataLoader, or provide an API to cleanup the dataset.

I went with latter in #730 that provides PytorchDataset.close() API to cleanup cache.
This will also avoid synchronization issues that may arise due to DataLoader creating multiple processes.

Example usage:

dataset = ds.to_pytorch(...)
with closing(dataset):
  ...

@skshetry skshetry self-assigned this Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants