-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jmespath: new 500 error scenario? #368
Comments
The scope of this problem is not clear: the error will happen when...
|
Extra info from our Slack discussion: A: Trying to set all drugcentral values to an array of objects
Dylan recommended turning all drugcentral values into arrays by setting jmespath to drugcentral[].bioactivity[] query
I then tried doing B: this happens when querying with a chem ID too.
Compare with and without
C: If the document's `drugcentral` field is an object that lacks the `bioactivity` field, no error happens
Retrieves this document. Example query:
D: if I change up the query (set action_type with jmespath/filter) but still retrieve the problematic document, the error will still happen
This query doesn't retrieve the problematic document. I set
This query DOES retrieve the problematic document and errors out. I set
E: if I change up the uniprot_id but still retrieve the problematic document, the error will still happen
Uses
|
try this jmespath: (the target field here is "drugcentral", it converts it to an array and then edits the bioactivity field to remove noncompliant entries, finally it removes individual elements of drugcentral that do not have any bioactivity entries). this works for me combined with the jmespath_exclude_empty option note: |
Interesting. Did you use certain websites/resources to figure this out? If so, it'd really help me if you linked them (I know very little syntax >.< but I'm listing what I know for future documentation-writing). Even if there's a jmespath-built-in solution for this, I think it's still helpful to do the "next steps" I put in the opening issue (check the API documents, understand BioThings APIs' behavior for |
@colleenXu The main useful aspects from the spec: @ refers to the current object, merge can combine two objects, to_array is the correct way to convert anything to an array if it isn't already. I'll try to send a summary of my investigations into jmespath/exclude_empty soon. |
my understanding of jmespath behavior based on reading the code (someone else can correct me if I am wrong, code) jmespath format: jmespath exclude behavior (code):
my assumption of what the patch is (I don't have a local setup):
|
UpdateI reviewed a list of documents where drugcentral's value may be an array (from @ctrl-schaff, lab Slack). Here's my conclusions:
I also found some documents that would be good for testing.Documents where some drugcentral items have bioactivity, some don't
All drugcentral items have bioactivity (but contents slightly diff), can see how BTE behaves with them
|
@rjawesome and co... I tested the jmespath string and it didn't seem to work. I used the query from the opening post and made a small change to When I directly copied Rohan's jmespath string, it looked like jmespath didn't run on the response at all. The response had bioactivity objects with action_type fields and with uniprot IDs that weren't Direct-copy query
I then tried removing spaces that may be interfering with execution (just there for readability). If the space between query, plus saved response with error message
|
I saw the linked PR, and I'm wondering what the next steps are. If the next steps involve running queries on a dev server that uses that PR, maybe it'd be helpful to do that in a meeting together (so I can give feedback on responses/we can adjust queries immediately)? |
whoops, I sent the wrong query. This was the one that was working for me (the merge function is mapped on all the drugcentral elements) (command: |
Update! According to @DylanWelzel, MyChem has been updated to include #372. The other APIs haven't been updated yet. I tested the following documents/queries and all worked as-expected (no 500 errors, response data is correct)! So I think the PR fully addresses this problem.
(I also tested x-bte annotation/integration with BTE and things looked mostly good there. Noticed one issue which I think is more biothings/mychem.info#191, so I'll describe it there.) |
@DylanWelzel @ctrl-schaff (based on discussion in this Slack thread)
While writing queries and testing x-bte annotation in biothings/biothings_explorer#904, I found a query that returns an error:
{"code":500,"success":false,"error":"Internal Server Error","details":"bioactivity"}
. Similar queries worked fine (different uniprot ID used in body and jmespath parameter).click to see problematic query
Johnathan confirmed that this error could be reproduced locally, and a more specific error message was "KeyError: 'bioactivity'"
Based on our digging so far, I think this error occurs when this query meets all of this criteria:
drugcentral
's value is an array of objects, rather than an object. Johnathan found <50 more documents with this structure.drugcentral
objects have thebioactivity
field and others don't.jmespath_exclude_empty=true
. If you try taking it out, the query then returns without an error....but will keep the hits that didn't pass the criteria specified injmespath
(bioactivity
is[]
ornull
after jmespath processing). For BTE/retriever use, it's important that those non-matching hits are removed, and that's whatjmespath_exclude_empty=true
was supposed to do.OEYIOHPDSNJKLS-UHFFFAOYSA-N
hasbioactivity
[] or doesn't exist. So both jmespath criteria weren't met in a single bioactivity object, and we want to remove this hit entirely withjmespath_exclude_empty
.P05186
from extra info 5 as a positive control - the problematic doc will have a bioactivity object that meets the jmespath criteria - so it should be kept in the response.I think the next steps are to:
drugcentral
is an array. Is this data structured correctly (should it all be organized into 1 document)?jmespath_exclude_empty=true
behavior currently works and perhaps change it.drugcentral
objects that don't have thebioactivity
field? I don't know how to do this.filter=_exists_:drugcentral.bioactivity
doesn't work (it removes entire hits only if the entire document lacks the bioactivity field)The text was updated successfully, but these errors were encountered: