Skip to content

DevoAllen/Mask-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Masking and Generation: An Unsupervised Method for Sarcasm Detection

Code License Data License

Introduction

Existing approaches for sarcasm detection are mainly based on supervised learning, in which the promising performance largely depends on a considerable amount of labeled data or extra information. In the real world scenario, however, the abundant labeled data or extra information requires high labor cost, not to mention that sufficient annotated data is unavailable in many low-resource conditions.

To alleviate this dilemma, we investigate sarcasm detection from an unsupervised perspective, in which we explore a masking and generation paradigm in the context to extract the context incongruities for learning sarcastic expression. Further, to improve the feature representations of the sentences, we use unsupervised contrastive learning to improve the sentence representation based on the standard dropout.

Experimental results on six perceived sarcasm detection benchmark datasets show that our approach outperforms baselines. Simultaneously, our unsupervised method obtains comparative performance with supervised methods for the intended sarcasm dataset.

Usage

1.Instrallation

git clone https://github.com/DevoAllen/Mask-Generation.git
cd Mask-Generation
pip -r requirements.txt

2.Preparing datasets

Please download datasets and organize them as follows:

  datasets
  └── IAC1
    └── train.csv
    └── test.csv
  └── IAC2
    └── ...
 └── riloff
    └──...
    ...  

3.Running

bash run.sh

After specifying the dataset and mask rate, the script will execute three steps:

  • Mask and generation (by code/main.py);
    • The results of this step will be placed in the following path (using the IAC1 dataset with a masking rate of 0.5 as an example):
    output
    └── IAC1
      └── test_aug_0.5
          └── all-sarc-not-sarc.csv
          └── O4A_mask.csv
          └── O4B_mask.csv
          └── sim_0_sentences.csv
          └── sim_1_sentences.csv
    
  • Sentence embedding (by code/main.py);
    • The results of this step will be placed in the following path:
    output
    └── IAC1
      └── test_aug_0.5
          └── similarity_scores
              └── simcse-0.5-sim_scores.csv
          └── ...
    
    • Additionally, for your convenience, we have open-sourced our own SimCSE models trained on the corresponding dataset, which you will need to download and use.
  • Results computation (by code/statistic.py).
    • The results are placed in:
    output
    └── IAC1
      └── test_aug_0.5
          └── results
              └── 0.5-results.csv
          └── similarity_scores
              └── simcse-0.5-sim_scores.csv
          └── ...
    

Acknowledgement

This repo benefits from SimCSE and senticnet. Thanks for their wonderful works!

Citations

If you find our project helpful, hope you can star our repo and cite our paper as follows:

@inproceedings{10.1145/3477495.3531825,
author = {Wang, Rui and Wang, Qianlong and Liang, Bin and Chen, Yi and Wen, Zhiyuan and Qin, Bing and Xu, Ruifeng},
title = {Masking and Generation: An Unsupervised Method for Sarcasm Detection},
year = {2022},
isbn = {9781450387323},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3477495.3531825},
doi = {10.1145/3477495.3531825},
pages = {2172–2177},
numpages = {6},
series = {SIGIR '22}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published