-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ames/harvesters/caltechdata.py fails at line 35 #14
Comments
Give the current main version a try if you haven’t yet. I may have fixed this already. I haven’t done a new release yet since I haven’t fully tested it.
On Oct 27, 2022, at 10:05 AM, R. S. Doiel ***@***.***> wrote:
This came up after updating Feeds v1 to use v0.9.0 release of ames. Harvesting from CaltechDATA is aborting. The full error message is
raceback (most recent call last):
File "/Sites/feeds.library.caltech.edu/harvest.py", line 265, in <module>
harvest_caltechdata('etc/CaltechDATA.json')
File "/Sites/feeds.library.caltech.edu/feeds/caltechdata.py", line 33, in harvest_caltechdata
get_caltechdata(collection_name, production=True)
File "/Users/datawork/Library/Python/3.10/lib/python/site-packages/ames/harvesters/caltechdata.py", line 35, in get_caltechdata
hits += response["hits"]["hits"]
KeyError: 'hits'
I believe the issue is that line 35 uses response["hits"]["hits"] before one or other levels of the array has been initialized. If this is correct then the hits += should fall back to zero is my guess. I will test this and then do a pull request when I have a fix ready.
—
Reply to this email directly, view it on GitHub<#14>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ACSMOGVWB54C35UHCLNVJUDWFKY4FANCNFSM6AAAAAARQITKMM>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
I've deployed the branch issue-14 in AMES and that seems to work, still tracking down other bugs. YMMV so review my changes before accepting the pull request. |
Does hits come from |
After looking closer at how things work I think the problem may be in caltechdata_api. In terms of processing you should be able to retrieve an setup of ids. I think this is done with get_metadata() but if no ids are specific request the URL that gets used ends in the '/' as mentioned previously. This is an erorr for our Invenio-RDM deployment. What needs to be requested is probably the clean URL without trailing '/' but with the paging parameters to get the chunks. This could be done by tweaking get_metadata() in the caltechdata_api to handle that case or it could be done as a specific function call. Not sure correct approach. |
I've taken a look at the code and it all should work. get_metadata just returns metadata from one record at a time, while get_caltechdata collects all the record ids and them loops over them. We could make this much more efficient by using an updated data parameter, but this should work for the moment. Both caltechdata_api and ames will have to be installed from the latest main branch to work. I didn't want to mess with datawork since it seems feeds at least partially ran this morning. |
I'll pull down the last releases and give it a try. I think the code for harvesting CaltechDATA only uses get_metadata (at least that's where the errors started piling up). I'll make sure it calls the get_caltechdata first (I'm guessing that it is not). I think we'll then be pretty close to a working v1 feeds again.
Having stuck my nose in the codebase again I am thinking about implementing the editors feeds in v1. That would let me close some tickets that date back to 2021. I also am now reconsidering my thoughts about my removal libdataset for dataset v2. I'm thinking that I should go ahead and support it in v2. The reason being is it will allow us to have a smoother transition to v2 of feeds if I don't have to worry about incompatible dataset collections. Cross compilation of libdataset requiring cgo though remains an issue. Especially now that I have an M1 Mac but macOS is becoming tricky anyway so I suspect it'll get to the point where I need separate machines. I think AWS maybe about to held there. I believe they (or maybe it was Azure) have a Mac server option. I'm going to explore that more as I want to figure out how to have a new implementation of dataset running on new hardware at some point in 2023.
All the best,
Robert
R. S. Doiel
Applications Developer
Digital Library Development
***@***.******@***.***>
(626) 395-6428
…________________________________
From: Thomas Morrell ***@***.***>
Sent: Friday, October 28, 2022 12:16 PM
To: caltechlibrary/ames ***@***.***>
Cc: R. S. Doiel ***@***.***>; Assign ***@***.***>
Subject: Re: [caltechlibrary/ames] ames/harvesters/caltechdata.py fails at line 35 (Issue #14)
I've taken a look at the code and it all should work. get_metadata just returns metadata from one record at a time, while get_caltechdata collects all the record ids and them loops over them. We could make this much more efficient by using an updated data parameter, but this should work for the moment. Both caltechdata_api and ames will have to be installed from the latest main branch to work. I didn't want to mess with datawork since it seems feeds at least partially ran this morning.
—
Reply to this email directly, view it on GitHub<#14 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AABJIPO3IRBIOYYPVZOPFE3WFQRBDANCNFSM6AAAAAARQITKMM>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
What dates value should I used as "updated"? The data I get via AME's doesn't seem to include one when I use get_caltechdata to populate CaltechDATA.ds in feeds. |
This came up after updating Feeds v1 to use v0.9.0 release of ames. Harvesting from CaltechDATA is aborting. The full error message is
I believe the issue is that line 35 uses response["hits"]["hits"] before one or other levels of the array has been initialized. If this is correct then the
hits +=
should fall back to zero is my guess. I will test this and then do a pull request when I have a fix ready.The text was updated successfully, but these errors were encountered: