Skip to content

Latest commit

 

History

History
20 lines (8 loc) · 1.03 KB

README.md

File metadata and controls

20 lines (8 loc) · 1.03 KB

AutoBLC:

This is the GitHub repository for the paper: E.Koc, İ.Şanlı A. Koç, “ Automatic Detection of Basic Level Categories ” submitted to ACM Transactions on Intelligent Systems and Technology. In this study, we aim to automatically identify basic level categories using classical machine learning methods where the features are extracted using word embeddings including W2G, GPT-2 and BERT.

Extract Features from BERT and GPT-2

To extract features from the BERT and GPT-2, go bert_gpt2_extractor.py and run scripts bert_embedding.py and gpt2_embedding.py using your corpus. In this folder, we also provide two small corpus for you to get familiar with the code.

Feature Selection

We share the raw and selected features in this repo. You can use the feature_selection.ipynb code to select your features based on the correlation between features and targets

Models

Using the models.ipynb, you can run the code for all models. We also provide the trained models for both datasets.