Note: If you are not a CMU OAQA person, please refer to the general README for preparing the resource.
- You need to contact Zi Yang to obtain the UMLS account, if you don't have one nor plan to register one, and our local copies of the resources. Uncompress the
.tgz
file (15G).bioasq-internal-resources/index
directory has two Lucene indexesbioasq-internal-resources/index/medline16n-lucene/
is for the Medline corpusbioasq-internal-resources/index/bioconcept-lucene/
is for the biomedical ontology dumps
bioasq-internal-resources/input
directory contains the test files and the original4b-dev.json
development setbioasq-internal-resources/medline16n.db3
is the sqlite database that has thepmid2abstract
table
- You need to generate the
4b-dev.json.auto.fulltext
file using4b-dev.json
andmedline16n.db3
- Install the Python
editdistance
package. - Download the python script
bioasq-dev-fixer.py
- Fix the formatting errors in the development file.
python bioasq-dev-fixer.py path_to_4b-dev.json path_to_medline16n.db3 4b-dev.json.auto.fulltext
- The resulting file should have a md5 of
db72a8fe3f1b3d605b9c39efdd21249d
.
- Install the Python
- Now you can continue on to the
Install
section in the README.