Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 1.77 KB

Readme.md

File metadata and controls

6 lines (4 loc) · 1.77 KB

NLP For Bleeding Detection

This repository contains python scripts used to produce the models that were evaluated in the 2018 University of Utah study to detect bleeding with NLP. It is organized into three sections corresponding to the three folders in the top level of the repository: machine learning code, rule-based code, and code for conducting McNemar's test to compare the performance of the RB and ED-DS models. The MachineLearning directory contains the scripts that were used to train the models along with the models themselves. The RuleBased directory contains the script that was used to run ConText. The script makes use of a python package called eHostess which was created by us to facilitate the annotation process. Among other things, eHostess provides a wrapper for ConText, as implemented in another python package pyConTextNLP.

A Note On Running the Code

A requirements file has been included listing the main python dependencies. It is also important to note that when the TfidfVectorizer instance used in the SVM training script was serialized using pickle, it stored a reference to the tokenizer function rather than the definition of the function. This means that when the SVM model is deserialized, it will expect to be able to find a reference to a function called __main__.tokenize. This is not a problems so long as the model is unpickled in the SVM training script included in this repository. However, if the model is deserialized in another script then __main__.tokenize must be defined, either by copying the tokenize function from the SVM training script into the main script, or by importing the tokenize.pyc module included in the TrainingScripts directory.