Trec-Analyzer is a tool for developing and testing solutions for TREC Precission Medicine track. It is implemented as a service and can be easly deployed using docker-compose
. To be more comprehensive Trec-Analyzer has trec_eval built in.
Trec-Analyzer by default supports Elasticsearch and Terrier using BM25
and DFR
strategies. For instruction on configuring indices or adding new ones chekc Deployment and Development parts.
The service exposes REST endpoints for:
- searching given text query with chosen algortihm and engine,
- searching given TREC PM topic with chosen algortihm and engine,
- performinng search and evaluation for given TREC PM topics with chosen algortihm and engine,
- performinng search and evaluation for all TREC PM topics with chosen algortihm and engine.
Endpoints are implemented in rest.kt file. They can be easly discovered using Postman collection provided in postman directory.
The system consist of trec-service
and Elasticsearch cluster. trec-service
has Terrier and trec_eval
built in. The necessary configurations are:
- for
trec-service
:- in
docker-compose.yml
:- environtment variables:
TREC_INIT_TERRIER
,TREC_INIT_ES
andTREC_SERVER
(explained below) - link to volume with corpus
- link to volume with terrier index
- environtment variables:
- in
trec-service/config/application.conf
:- server and indices properties
- in
- for Elasticearch:
- in
docker-compose.yml
:- environtment variables:
thread_pool.write.queue_size
andES_HEAP_SIZE
(depending on machine capabilities) - link to volume with es index
- environtment variables:
- in
Performance configuration is trimmed for machine with 32x CPU and 128Gb.
After start of the application variables TREC_INIT_TERRIER
and TREC_INIT_ES
are checked. If the first is set to 1
, a new terrier index will be created using data from /corpus
. The if the second is set to 1
, a new Elasticsearch index will be created using data from /corpus
. Thanks to linked volumes indices are persistent with container builds.
The property init.corpusFiles
states how many files are read from corpus dir. Set it to one for reading all files. Fields chunkSize
and workers
for es
and fields workers
for terrier
shall be set according to capabilities of the host machine.
After creating indices if TREC_SERVER
is set to 1
the API will start. By default it runs on port 8001
on the host machine.
New indices can be added by extending IndexService
with new implementations of Repository
.
This project was made during Application of information technologies on Poznan Univertisy of Technology under the supervision of Prof. Czesław Jędrzejek, Phd.Eng. and Jakub Dutkiewicz, M.Eng.