Skip to content

Code wrapped into a UIMA component that maps gene names found in scientific literature to NCBI Gene human gene IDs.

License

Notifications You must be signed in to change notification settings

JULIELab/gene-name-mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gene Name Mapping

The code in this repository is used to find NCBI Gene Ids for textual mentions of gene names that have been found in scientific literature text by entity recognition software. Found gene names are searched in NCBI Gene names and synonyms via a Lucene index and disambiguated using GeneRIF sentences.

The code found here shares its foundations with GeNo [1] but has been

  1. adapted to the specific needs of project partners interested in clinical relevant gene mentions and
  2. updated regarding the underlying databases, most importantly NCBI Gene itself.

To let the mapping code run, a UIMA pipeline is required in which this component can be embedded. The simplest way would be to use JCoRe components to form the pipeline. The descriptor for the gene mapping component is found at gene-name-mapping-ae/src/main/resources/de/jules/ae/genemapping/desc/genemapping-ae.xml.

To build the project yourself (required for resource creation), install Maven >= 3.x and execute mvn clean package in the repository root.

To build the resources from scratch, run the gene-name-mapping-resource-creation/update_resources_and_indices/metaScript.sh script. Note, however, that a number of downloaded source resources are required. To get those, run gene-name-mapping-resource-creation/update_resources_and_indices/downloadExternalResources.sh. Optional: To set the path to specific resource files, make the appropriate changes to gene-name-mapping-resource-creation/update_resources_and_indices/setCustomResourcePaths.sh which is called from within metaScript.sh.

[1] Wermter, J., Tomanek, K., & Hahn, U. (2009). High-performance gene name normalization with GeNo. Bioinformatics (Oxford, England), 25(6), 815–821. https://doi.org/10.1093/bioinformatics/btp071

About

Code wrapped into a UIMA component that maps gene names found in scientific literature to NCBI Gene human gene IDs.

Resources

License

Stars

Watchers

Forks

Packages

No packages published