-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command for fetching HathiTrust Images #694
Conversation
Note this currently relies on a feature branch of ppa-nlp
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #694 +/- ##
===========================================
- Coverage 95.12% 94.35% -0.78%
===========================================
Files 139 141 +2
Lines 7573 7815 +242
Branches 7 8 +1
===========================================
+ Hits 7204 7374 +170
- Misses 369 441 +72 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic, and the script seems to be working really well.
I think there's a bit more work to do to polish it up and improve the output and reporting, but it seems like the basic functionality is pretty much working as we want it to.
It would be nice to add handling for SIGINT / ctrl-c so you can stop the script but have it end gracefully, finish downloading images for the current volume and report on what ws done, then stop. If you're interested in adding that, let me know, I should be able to find an example where I've done something similar.
|
||
# Fetch thumbnail if file does not exist | ||
page_thumbnail = thumbnail_dir / image_name | ||
if not page_thumbnail.is_file(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about a volume-level check to skip an entire volume if the expected number of images are present, but glad to see you have this check.
Co-authored-by: Rebecca Sutton Koeser <[email protected]>
Co-authored-by: Rebecca Sutton Koeser <[email protected]>
Appears to be some issue where None is being return instead of 0
* Added keyboard interrupt handling * Modified progress bar * Added logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, let's get this version merged !
elif response.status_code == 503: | ||
logger.debug("Received 503 status code. Throttling may have occurred") | ||
return success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Have you run into any other error codes in your testing? It would be good to log other errors, but if they haven't come up yet then let's leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't actually run into any errors.
logger = logging.getLogger(__name__) | ||
|
||
|
||
class DownloadStats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems complicated but I see the value of it for wrangling multiple counts and displaying them nicely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is pretty much just a wrapped tuple of Counters with a pretty print.
ref: #663
New command for fetching HathiTrust images ("full-size" and thumbnails).
Notes:
Questions: