Skip to content

Latest commit

 

History

History
20 lines (18 loc) · 1.62 KB

INTERNAL_INSTRUCTION.md

File metadata and controls

20 lines (18 loc) · 1.62 KB

Internal Resource Preparation Instruction for OAQA Biomedical Question Answering (BioASQ) System

Note: If you are not a CMU OAQA person, please refer to the general README for preparing the resource.

  1. You need to contact Zi Yang to obtain the UMLS account, if you don't have one nor plan to register one, and our local copies of the resources. Uncompress the .tgz file (15G).
    • bioasq-internal-resources/index directory has two Lucene indexes
      • bioasq-internal-resources/index/medline16n-lucene/ is for the Medline corpus
      • bioasq-internal-resources/index/bioconcept-lucene/ is for the biomedical ontology dumps
    • bioasq-internal-resources/input directory contains the test files and the original 4b-dev.json development set
    • bioasq-internal-resources/medline16n.db3 is the sqlite database that has the pmid2abstract table
  2. You need to generate the 4b-dev.json.auto.fulltext file using 4b-dev.json and medline16n.db3
    1. Install the Python editdistance package.
    2. Download the python script bioasq-dev-fixer.py
    3. Fix the formatting errors in the development file.
      python bioasq-dev-fixer.py path_to_4b-dev.json path_to_medline16n.db3 4b-dev.json.auto.fulltext
      
    4. The resulting file should have a md5 of db72a8fe3f1b3d605b9c39efdd21249d.
  3. Now you can continue on to the Install section in the README.