Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache functionality #50

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Add cache functionality #50

wants to merge 5 commits into from

Conversation

pdvass
Copy link

@pdvass pdvass commented Mar 21, 2023

The proposed changes are based on Crossref's REST API documentation recommendation of caching the responses.

NOTE: The cache that unittest creates is persistent. To tackle that, I have written .bat and .sh files to run develop and test from setup.py, then delete the cache created. If needed, I can provide them too.

@pdvass pdvass marked this pull request as draft April 12, 2023 09:51
@pdvass pdvass marked this pull request as ready for review April 12, 2023 09:51
@fabiobatalha
Copy link
Owner

Hello @pdvass

Thanks for the commit.

In my point of view, what we would cache is the http requests made to the Crossref API.

So, I think any cache implementation should be in the HTTPRequest class, in specific the do_http_request method.

Besides that, I'm not sure about the benefits to include a caching implementation in this library, once it is mostly used for data harvesting, and the chances to reuse or benefit from a cache is almost null, but maybe I could be wrong. Even though, it is fine for me to include a cache layer in the terms described above.

@pdvass
Copy link
Author

pdvass commented Apr 13, 2023

Hello @fabiobatalha

Thank you for your feedback.

The main motive behind this PR is that I had to create a dataset that I didn't know from the beginning the size that it should be. My approach was to wait for the responses, convert them and then save them, but it was an expensive conversion for a bigger dataset. So, by caching I was able to save the responses and then add more, but not the same, if needed. Also, I find it easier to transfer it from device to device, to not rerun everything and save time.

As of the implementation details, this is how I managed to get it working, because I needed a way to choose a backend.

@@ -134,14 +136,18 @@ class Endpoint:

def __init__(
self,
backend=None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend parameter should be the last one. Some users don't use named parameters and your commit will break their scripts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought, since all the parameters had default values, if I gave backend also a default value it wouldn't break anything. But it is something that I can change it, so OK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave a second thought, and I understood what you are saying. You're right. I will change it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants