-
Notifications
You must be signed in to change notification settings - Fork 113
Articles id parsing issue #22
Articles id parsing issue #22
Comments
I too ran into this. The article titled "Premorbid IQ varies across different definitions of schizophrenia" returns .pubmed_id '17342225\n10435610\n1638332\n15474902\n14302768\n9403903\n16297601\n5009428\n6382590\n12597613\n3292568\n16221995\n10986554\n16946869\n1182406\n12414070\n16330717\n15066893\n16484093\n1931805\n10678506\n9223148\n16639153\n4752222\n10442433\n12379446' |
This is due to how getContent is parsing the XML. Looking at @M0rtenB 's example in XML, the Author's of "Premorbid IQ ..." seem to have included all the pubMed ID's for their citations. Most article's will only have a small articleID snippet (not every article ID for citations) which will look like this:
We could probably change _extractPubMedID to use |
@nleguillarme your example also uses citation articleIDs |
I too ran into this. |
This fix avoids returning also the IDs of cited papers (they are within the ReferenceList element of the xml). Fixes gijswobben#22 An alternative XPath to be used: path = ".//PubmedData/ArticleIdList/ArticleId[@idtype='pubmed']"
@gijswobben @nleguillarme I made a pull request for this issue. Basically following @mbullmanFHCRC suggestions, actually. |
multiple PMID's are getting parsed. Those other id's are likely the PMID's of cited articles in the article under consideration. resolves gijswobben#22
While iterating on articles resulting from a PubMed query, I noticed that some article ids have parsing issues.
For instance :
Query : ((Haliaeetus leucocephalus[Title/Abstract])) AND ((prey[Title/Abstract]) OR (diet[Title/Abstract]))
Returns (when printing first 10 results) :
pubmed_id = '22822430\n18959310\n21310968\n21295371\n20439737'
abstract = ('Bald eagles (Haliaeetus leucocephalus) are recovering from severe population declines...
The text was updated successfully, but these errors were encountered: