A significant amount of experimental information about Quantitative Trait Locus (QTL) studies are described in (heterogenous) tables of scientific articles. Briefly, a QTL is a genomic region that correlates with a trait of interest (phenotype). QTM is a command-line tool to retrieve and semantically annotate results obtained from QTL mapping experiments. It takes full-text articles from the Europe PMC repository as input and outputs QTLs in a relational database (SQLite, see the ER diagram) and a text file (CSV).
- Oracle/OpenJDK8
- Apache Maven 3.x
- SQLite 3.x
- Apache Solr 6.x with cores based on domain-specific vocabularies and ontologies (Solr cores):
- Gene Ontology (GO)
- Plant Trait Ontology (TO)
- Phenotypic quality ontology (PATO)
- Solanaceae Phenotype Ontology (SPTO)
- STATistics Ontology (STATO)
- Chemical Entities of Biological Interest (ChEBI)
- access to full-text articles (in XML) from Europe PMC
git clone https://github.com/PBR/QTM.git
cd QTM
mvn clean install
solr/install.sh
./QTM -h
usage: QTM [-h] [-v] [-o OUTPUT] [-c CONFIG] [-V VERBOSE] FILE
Software to extract QTL data from full-text articles.
positional arguments:
FILE input list of articles (PMCIDs, one per line)
named arguments:
-h, --help show this help message and exit
-v, --version show version and exit
-o OUTPUT, --output OUTPUT
filename prefix for output in SQLite (.db) and text (.csv) files (default: qtl)
-c CONFIG, --config CONFIG
config file (default: config.properties)
-V VERBOSE, --verbose VERBOSE
verbosity console output: 0-7 for OFF, FATAL, ERROR, WARN, INFO, DEBUG, TRACE or ALL (default: 4 [INFO])
- input:
articles.txt
andconfig.properties
files - output:
qtl.csv
andqtl.db
files
Note: If you don't have access to Internet or Europe PMC, you can still run QTM on XML files stored in the data directory.
# cp data/*.xml .
./QTM articles.txt