Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data caching? #106

Open
7yl4r opened this issue Oct 11, 2022 · 3 comments
Open

data caching? #106

7yl4r opened this issue Oct 11, 2022 · 3 comments

Comments

@7yl4r
Copy link
Collaborator

7yl4r commented Oct 11, 2022

a nice feature to have would be to allow retention of cached data so that data does not need to be redownloaded.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Oct 11, 2022

Based on the size of the data I would image a disk cache, right?

@7yl4r
Copy link
Collaborator Author

7yl4r commented Oct 11, 2022

Yes, I was imagining:

To save:

  • hash the input parameters
  • generate filename using {param_hash}_{record_count}.p) tempfile package
  • dump results to file w/ pickle package

to fetch before performing query:

  • hash input parameters
  • check for filenames
  • hit the API for record_count only
  • load the hash file if unchanged record_count

I'm not certain if record_count is the only parameter to look at, however.
Records could be updated without changing the count of records.

@ocefpaf
Copy link
Collaborator

ocefpaf commented Oct 11, 2022

I usually go to https://joblib.readthedocs.io/en/latest/auto_examples/memory_basic_usage.html for that kind of operation. If you choose to not save on disk we can use functools.lru_cache from the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants