-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch covers importing IA endpoint #7616
Comments
The short of this is that the endpoint used to display covers on IA searches seems substantially better than the current endpoint, though the resolution is lower on the proposed newer endpoint. I made a Google Colab that enables a very simplistic cover comparison between the current and proposed 'new' cover endpoints for around 25 books that were on the import list of IA items that appeared book-like, but lacked a MARC record, and those that appeared book-like and had a MARC record: https://colab.research.google.com/drive/18AT-Hdu7j9dyrVaSR3VAEPTceC9qVpBp#scrollTo=yS1E0hh00KFT The script simply iterates through the list of The script takes about two minutes to run (for some reason...), but the results should be pretty easy to scroll through and analyze. Each image is labeled at the top as to which endpoint it comes from. The 'new' endpoint seems to have either the same covers, or 'better' ones. However, the covers an the proposed 'new' endpoint are usually a lower resolution. I didn't resize any of the image output so that it's easier to see what is fetched, even if it makes it a bit annoying to view. |
@cdrini @scottbarnes I think there's a deeper problem here than just switching the endpoints. My understanding It's possible that something has changed in the scanning process. I see archive.org items have a The
My feeling is the OL code is currently doing the right thing, but archive.org has data issues with how some title / covers are marked up.
If recent scanned items are inconsistent, or consistently showing title pages when we'd expect covers, we'll need to feed back to the scanning process. If recent scans are consistent and there's a different current way of picking the best 'title' image, we'll need to know what that is. |
Ah, I see what you are saying about the cloth cover issue, @hornc. For some reason I wasn't seeing the whole list, even though I created the thing. :) I will look at trying to get some more recent data for further discussion. |
That appears to no longer be the case; now it appears that Here is a comparison of the three endpoints, based on the most recent 100 importbot edit OCAIDs. It seems like |
@cdrini
This means for most modern style books we want AFAICT There used to be either a manual process or automated smarts to figure this out correctly, and it looks like it's no longer working. archive.org is probably going to have the same issue -- maybe there is another way to determine cloth bound and pick the correct and useful image? |
I'm not sure, but Let's roll with switching! The images are slightly smaller, but still huge :P |
@mekarpeles I am willing to work on this issue. Additionally, would it be a good practice to add some metadata (some sort of flag?) to covers where we have used the IA covers? Even going further to discern between those using the |
Thanks for offering to work on this, @Spaarsh. I went ahead and assigned this to you. As for whether we should store some additional metadata, the goal of being able to tell which cover import endpoint an IA import used is a good idea, though in this case I think we can just look at the import date if we need to, and compare it to the date of this change. |
Should I include the original endpoint as a fallback in case of a 404 like this?
|
That seems like a good idea. I say go for it. |
@cdrini @scottbarnes @mekarpeles while working on this issue, I came across this code where we are still using the old endpoint for getting the covers in the /plugins/upstream/models.py. Are there any specific reasons for not switching the endpoint here as well? openlibrary/openlibrary/plugins/upstream/models.py Lines 80 to 83 in 8fc3f86
|
@Spaarsh, I'm very sorry for how long it took to respond. It could very well be that it makes sense to update that as well, but for the purpose so this specific issue, let's keep the scope as currently defined. If you're interested, I think it would make sense to open a new issue to explore whether we want to change |
Switch covers import endpoint to be eg https://archive.org/services/img/aldosfantastical0000frie/full/pct:600/0/default.jpg .
This appears to more accurately choose between cover and title page. Avoiding blank covers, but using them when necessary.
Some evaluation would be useful to see if this is indeed a better switch.
openlibrary/openlibrary/core/ia.py
Lines 117 to 123 in 2a36c54
Stakeholders
@scottbarnes @hornc
The text was updated successfully, but these errors were encountered: