BioNER

BioNER Code is adapted from WeLT: Improving Biomedical Fine-tuned Pre-trained Language Models with Cost-sensitive Learning

Installation

Dependencies

Python (>=3.6)
Pytorch (>=1.2.0)

Clone this GitHub repository
Navigate to the BioNER folder and install all necessary dependencies: python3 -m pip install -r requirements.txt
Note: To install the appropriate torch, follow the download instructions based on your development environment.

Data Preparation

NER Datasets

Dataset	Source
NCBI-disease BC5CDR-disease BC5CDR-chem	NER datasets are directly retrieved from BioBERT via this link
BioRED-Dis BioRED-Chem	We have extended the aforementioned NER datasets to include BioRED. To convert from `BioC XML / JSON` to `conll`, we used bconv and filtered the chemical and disease entities.

Data & Evaluation code Download
To directly download NER datasets for fine-tuning models from scratch, use download.sh or manually download them via this link in main directory, unzip datasets.zip and rm -r datasets.zip The same instructions are used for the evaluation code.

Data Pre-processing
We adapted the preprocessing.sh from BioBERT to include BioRED

Reproducing Paper's results

We conducted the experiments on two different BERT models using the WELT weighting scheme. We have compared WELT against the corresponding traditional fine-tuning approaches(i.e. BioBERT fine-tuning). We explain the WeLT fine-tuning approach. We provide all the fine-tuned models on Huggingface, an example of fine-tuning from scratch using WeLT, and an example of predicting and evaluating disease entities.

1. Fine-tuning BERT Models

Our experimental work focused on BioBERT(mixed/continual pre-trained language model) & PubMedBERT(domain-specific/trained from scratch pre-trained language model), however, WELT can be adapted to other transformers like ELECTRA.

Model	Used version in HF 🤗
BioBERT	model_name_or_path
PubMedBERT	model_name_or_path

2. WeLT fine-tuning

We have adopted BioBERT-run_ner.py to develop a cost-sensitive trainer in run_weight_scheme.py that extends Trainer class to WeightedLossTrainer and override compute_loss function to include WELT in weighted Cross-Entropy loss function

3. Building XML files

After fine-tuning BERT models, we recognize chemical & disease entites via ner.py. The output files are in predicted path directory

Evaluation
We have used the strict and approximate evaluation of BioCreative VII Track 2 - NLM-CHEM track Full-text Chemical Identification and Indexing in PubMed articles

Quick Links

Citation

The manuscript is in preparation (TBD)

Authors

Authors: Ghadeer Mobasher*, Pedro Ruas, Francisco M. Couto, Olga Krebs, Michael Gertz and Wolfgang Müller

Acknowledgment

Ghadeer Mobasher is part of the PoLiMeR-ITN (http://polimer-itn.eu/) and is supported by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement PoLiMeR, No 81261

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
named-entity-recognition		named-entity-recognition
predictedpath		predictedpath
referencepath		referencepath
unannotatedxmls		unannotatedxmls
README.md		README.md
download.sh		download.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioNER

Installation

Data Preparation

Reproducing Paper's results

1. Fine-tuning BERT Models

2. WeLT fine-tuning

3. Building XML files

Quick Links

Citation

Authors

Acknowledgment

About

Releases

Packages

Languages

mobashgr/WeLT-impact-on-BioNEL

Folders and files

Latest commit

History

Repository files navigation

BioNER

Installation

Data Preparation

Reproducing Paper's results

1. Fine-tuning BERT Models

2. WeLT fine-tuning

3. Building XML files

Quick Links

Citation

Authors

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages