Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 1.12 KB

README.md

File metadata and controls

16 lines (13 loc) · 1.12 KB

BART-MIMIC-CXR

Use BART word embedder to do the abstractive summarization NLP task on radiology reports from MIMIC-CXR v2.0.0.

Use:

  • clone this repository
  • in this repository, mkdir data
  • from MIMIC-CXR download cxr-study-list.csv.gz and place it in the data folder
  • also from MIMIC-CXR, download mimic-cxr-reports.zip, unzip it, and place it in the data folder
  • run python preprocessing, which generates train.csv, validate.csv, and test.csv in the data folder
  • use the notebook/*.ipynb files to run the various BART models
    • the baseline model creates a summaries.csv file in data/ that contains both ground-truth and generated summaries
    • the fine-tuned and fune-tuned with truncation models create a .txt file, that must be converted to a .csv file by running python merge_txt_csv.py
  • run python rouge_score or python word_count to get the ROUGE scores and word count of the summaries respectively.

Note: You may need to go into the python file to set the proper path or filename, which is all configurable from variables at the top of the file.