PubMed EDirect Recipes

Notes

user@computer:~$ represents an example terminal prompt name. Actual command/argument input is after the $.

Replace name@xx.edu with your email address.

\ followed by > on the next line represents continued terminal input. You will need to delete the > symbol in order to run the scripts as a copy/paste into terminal.

You should validate your own EDirect scripts and results as there may be unintentional mistakes in these recipes. A convenient method is to compare your EDirect results to the NCBI Web interface search results: https://www.ncbi.nlm.nih.gov/.

PubMed EDirect

Search PubMed by Keyword and/or MeSH and Retrieve References

We can use the EDirect function esearch to query PubMed. However, before trying to retrieve any of the results with efetch, it is a good idea to check that the count range is manageable (e.g., on the order of several thousand). In addition, see the EDirect Query Translation Instructions for how to use the -debug option to view how your query is interpreted in PubMed.

user@computer:~$ esearch -email name@xx.edu -db pubmed -query "hydrogel-based drug delivery"
<ENTREZ_DIRECT>
  <Db>pubmed</Db>
  <WebEnv>MCID...</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>436</Count>
  <Step>1</Step>
  <Email>name@xx.edu</Email>
</ENTREZ_DIRECT>

After deciding if the esearch query is appropriate, we can start to pipe the esearch results into other EDirect functions. For example, the below script first uses esearch to query PubMed for "hydrogel-based drug delivery", and then these results are piped (|) into efetch to retrieve the results as XML format. The efetch results are then piped to the xtract function where several bibliographic elements of the PubMed XML records are extracted into a table:

user@computer:~$ esearch -email name@xx.edu -db pubmed -query "hydrogel-based drug delivery" | \
> efetch -format xml | \
> xtract -pattern PubmedArticle -element MedlineCitation/PMID -first Author/LastName \
> Author/Initials ArticleTitle ISOAbbreviation PubDate/Year Volume Issue MedlinePgn
33424262	El-Masry	SM	Hydrogel-based matrices for controlled drug delivery of etamsylate: Prediction of in-vivo plasma profiles.	Saudi Pharm J	2020	28	12	1704-1718
33398321	Chen	W	Magnetically actuated intelligent hydrogel-based child-parent microrobots for targeted drug delivery.	J Mater Chem B	2021
33396629	Dehshahri	A	New Horizons in Hydrogels for Methotrexate Delivery.	Gels	2020	7	1
33387892	Amiri	M	Hydrogel beads-based nanocomposites in novel drug delivery platforms: Recent trends and developments.	Adv Colloid Interface Sci	2020	288	102316
33378390	Kloepping	KC	Triphenylphosphonium derivatives disrupt metabolism and inhibit melanoma growth in vivo when delivered via a thermosensitive hydrogel.	PLoS One	2020	15	12	e0244540
33359482	Agarwal	P	Structural characterization and developability assessment of sustained release hydrogels for rapid implementation during preclinical studies.	Eur J Pharm Sci	2021	158	105689
...
...

tested on 2021.01.27, EDirect 14.4, total count was 436.

Note that if we want to extract out the DOIs, we can use the xtract -block option like this:

user@computer:~$ esearch -email name@xx.edu -db pubmed -query "hydrogel-based drug delivery" | \
> efetch -format xml | \
> xtract -pattern PubmedArticle -element MedlineCitation/PMID -first Author/LastName \
> Author/Initials ISOAbbreviation PubDate/Year Volume Issue MedlinePgn \
> -block ArticleId -if ArticleId@IdType -equals doi -doi ArticleId
33424262	El-Masry	SM	Saudi Pharm J	2020	28	12	1704-1718	https://doi.org/10.1016%2Fj.jsps.2020.10.016
33398321	Chen	W	J Mater Chem B	2021	https://doi.org/10.1039%2Fd0tb02384a
33396629	Dehshahri	A	Gels	2020	7	1	https://doi.org/10.3390%2Fgels7010002
33387892	Amiri	M	Adv Colloid Interface Sci	2020	288	102316	https://doi.org/10.1016%2Fj.cis.2020.102316
33378390	Kloepping	KC	PLoS One	2020	15	12	e0244540	https://doi.org/10.1371%2Fjournal.pone.0244540
33359482	Agarwal	P	Eur J Pharm Sci	2021	158	105689	https://doi.org/10.1016%2Fj.ejps.2020.105689
...
...

tested on 2021.01.27, EDirect 14.4, total count was 436.

There is a lot going on with the last line of code that extracts out the DOIs: -block ArticleId -if ArticleId@IdType -equals doi -doi ArticleId. Let's look at part of a PubMed XML file to help interpret what is going on here:

...
...
<ArticleIdList>
        <ArticleId IdType="pubmed">17630804</ArticleId>
        <ArticleId IdType="doi">10.1021/jo071035l</ArticleId>
</ArticleIdList>
...
...

The -block option limits the extraction to a particular section of the XML, in this case the ArticleId tags. The @ defines the desired IdType doi element attribute. Finally, the -doi is an xtract string option that prefixes https://doi.org/ before the extracted ArticleId doi. There is a more thorough explanation of -block and extracting out the DOIs with the -block option in the NLM Insider's Guide to Accessing NLM Data Part 4 and Entrez Programming Utilities Help Manual.

Similarly to the above script, we can specify particular fields to query within PubMed. The below script searches for "ionic liquids" in the MeSH term field ([MESH]) and "imidazolium" in all fields. Note that the internal quotes are escaped (\), which is sometimes necessary for the query to be interpreted correctly when using phrases.

user@computer:~$ esearch -email name@xx.edu -db pubmed -query "\"ionic liquids\"[MESH] AND imidazolium" | \
> efetch -format xml | \
> xtract -pattern PubmedArticle -element MedlineCitation/PMID -first Author/LastName \
> Author/Initials ISOAbbreviation PubDate/Year Volume Issue MedlinePgn \
> -block ArticleId -if ArticleId@IdType -equals doi -doi ArticleId
33396149	Hu	LX	Ecotoxicol Environ Saf	2021	208	111629	https://doi.org/10.1016%2Fj.ecoenv.2020.111629
33346267	Kaur	M	Phys Chem Chem Phys	2021	23	1	320-328	https://doi.org/10.1039%2Fd0cp04513f
33253998	Tashakkori	P	J Chromatogr A	2021	1635	461741	https://doi.org/10.1016%2Fj.chroma.2020.461741
33142384	Ren	YM	Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi	2020	38	10	767-769	https://doi.org/10.3760%2Fcma.j.cn121094-20191010-00483
33135708	Kumar	S	Phys Chem Chem Phys	2020	22	43	25255-25263	https://doi.org/10.1039%2Fd0cp04014b
32822985	Zuo	L	J Chromatogr A	2020	1628	461446	https://doi.org/10.1016%2Fj.chroma.2020.461446
32711338	Zunita	M	Bioresour Technol	2020	315	123864	https://doi.org/10.1016%2Fj.biortech.2020.123864
...
...

tested on 2021.01.27, EDirect 14.4, total count was 1000.

Calculate the Most Frequent Journal Titles For a PubMed Search

The below script uses esearch to query PubMed for "Artificial Intelligence" in the [MESH] field and "drug discovery" in the [ALL] field. The records are then retrieved as XML format using the efetch function, followed by extracting out the journal names (IsoAbbreviation) using xtract. The xtract results are then piped to the EDirect alias function sort-uniq-count-rank, which sorts the data by highest frequency:

user@computer:~$ esearch -email name@xx.edu -db pubmed -query "\"Artificial Intelligence\"[MESH] AND \"drug discovery\"[ALL]" | \
> efetch -format xml | \
> xtract -pattern PubmedArticle -element ISOAbbreviation | \
> sort-uniq-count-rank
169	J Chem Inf Model
53	BMC Bioinformatics
49	PLoS One
40	Bioinformatics
39	Methods Mol Biol
33	Mol Pharm
32	Molecules
29	Sci Rep
28	Drug Discov Today
28	J Comput Aided Mol Des
24	J Med Chem
23	Expert Opin Drug Discov
23	Int J Mol Sci
19	Curr Top Med Chem
18	Mol Inform
17	Future Med Chem
16	Nucleic Acids Res
15	Nature
15	PLoS Comput Biol
14	IEEE/ACM Trans Comput Biol Bioinform
...
...