NewsPH-NLI-ULMFiT

This is an accompanying repository to the paper:

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets

^{Jan Christian Blaise Cruz, Jose Kristian Resabal, James Lin, Dan John Velasco, Charibeth Cheng}

This repository contains script to train and evaluate AWD-LSTM on NewsPH-NLI dataset.

Requirements

fastai >= v2.0.15
NVIDIA GPU (all experiments were done in Colab w/ Tesla T4)

Reproducing Results

First, clone the repository and download the data:

# Clone this repository
git clone https://github.com/danjohnvelasco/NewsPH-NLI-ULMFiT

cd NewsPH-NLI-ULMFiT

# Install gdown
pip install gdown

# Create a new folder
mkdir data

# Download NewsPH-NLI Dataset (preprocessed)
gdown --id 1-qOfNQy-piiaz8BcDfnS-ILlYio5S_g2

# Unzip
unzip newsph-nli-preprocessed.zip -d data

Download language model fine-tuned on NewsPH-NLI

# Make directory
mkdir models

# Download data
gdown --id 1-PI65kBGD0i2hE3KL5hjjCGDt_mMFKMs

# Unzip
unzip finetuned.zip -d models

# Finally
You should see two files: lm_fintuned_enc.pth (encoder) and news_vocab.pkl (vocab). 
This will be used later in classifier finetuning.

Textual Entailment Task

Here, textual entailment is treated as any classification task. To fine-tune, use the train.py script provided in this repository. Here's an example of fine-tuning a Filipino AWD-LSTM model on the NewsPH-NLI dataset:

python train.py \
    --pretrained_path "lm_fintuned_enc" \
    --vocab_path "models/news_vocab.pkl" \
    --checkpoint "model" \
    --train_data data/train.csv \
    --valid_data data/valid.csv \
    --test_data data/test.csv \
    --do_train \
    --do_eval \
    --batch_size 128 \
    --weight_decay 0.1 \
    --seed 42 \
    --lr_max 10e-3 \
    --epochs "4;2;2;2;1" \
    --data_pct 1.0

This should give you the following results:

Valid Loss 0.2685 | Valid Accuracy 0.8911
Test Loss 0.2589 | Test Accuracy 0.8937

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
utils		utils
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsPH-NLI-ULMFiT

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets

Requirements

Reproducing Results

Download language model fine-tuned on NewsPH-NLI

Textual Entailment Task

About

Releases

Packages

Languages

danjohnvelasco/NewsPH-NLI-ULMFiT

Folders and files

Latest commit

History

Repository files navigation

NewsPH-NLI-ULMFiT

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets

Requirements

Reproducing Results

Download language model fine-tuned on NewsPH-NLI

Textual Entailment Task

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages