Skip to content
This repository has been archived by the owner on Jan 12, 2024. It is now read-only.

Here we explore cross-domain transferability of adversarial examples in NLP.

License

Notifications You must be signed in to change notification settings

jyaacoub/Cross-Domain-Attacks-NLP

Repository files navigation

Cross-Domain-Attacks-NLP

Poster:

For a visual overview of the project we have a poster here

File Structure:

  • media/ - relevent figures and results here

  • src/

    • common/
      • constants.py - For constant variables used throughout the application.
    • models/
      • adversarial_model.py - Initial file for handling the trained adversarial model (this will include multiple models for testing)
      • black_box_model.py - File containing the black-box model for which we will be generating adversarial examples for.
    • utils/ - any relevent util functions/classes here.
    • data/ - any relevent datasets here.
  • run.py - where we will run the application and do tests from.

Testing Procedure

This section shows what models we will be testing as we build up to answering our research question.

0. Same Domain Same Task:

This has already been shown to work in many previous papers. No need to do anything for this.

1. Similar Domain, Same Task:

Here we replicate how Datta S. 2022 tested similar domains by examining how adversarial examples for one category of amazon reviews transfers to another catagory (e.g.: from baby items to books). Here our similar domains are movie reviews, but from IMBD and from Rotten Tomatoes

2. Similar Domain, Different Task:

Building off of 1 we now try to test how adversarial examples transfer across different tasks (e.g.: sentiment analysis of polarity vs subjectivity).

3. Different Domain, Same Task:

Now we test how models transfer across different domains but on the same task (e.g.: polarity of amazon reviews vs polarity of tweets; tweets might not be different enough...).

4. Different Domain, Different Task:

Finally we arrive at the untimate black-box condition, if we get this far then this tells us a lot about where adversarial examples in NLP come from. It implies that they are a result of low level features of text? it also implies that models across domains have decision boundaries that are very similar.

Attacks

The attacks we will use are listed in the table below, the metrics were obtained using 100 sentences from rotten_tomatoes and using roberta-base-imdb as the target model to benchmark the different attacks.

Attack Time (s) Original_Accuracy Perturbed_accuracy
A2TYoo2021 77.9 90 66
BAEGarg2019 250.5 90 34
DeepWordBugGao2018 79.4 90 11
PWWSRen2019 335.7 90 5
TextBuggerLi2018 135 90 27
TextFoolerJin2019 240.2 90 0

Results

Setup

Domain Substitute model Target model
similar_domain_same_task textattack/roberta-base-imdb textattack/roberta-base-rotten-tomatoes
similar_domain_different_task cardiffnlp/twitter-roberta-base-irony cardiffnlp/twitter-roberta-base-sentiment
different_domain_same_task cardiffnlp/twitter-roberta-base-sentiment textattack/roberta-base-rotten-tomatoes
different_domain_different_task cardiffnlp/twitter-roberta-base-irony textattack/roberta-base-rotten-tomatoes

Attacks

Domain Original accuracy A2TYoo2021 accuracy BAEGarg2019 accuracy DeepWordBugGao2018 accuracy PWWSRen2019 accuracy TextBuggerLi2018 accuracy TextFoolerJin2019 accuracy
similar_domain_same_task 88.30 80.60 62.20 61.10 69.40 70.70 67.10
similar_domain_different_task 70.50 68.70 65.20 67.10 66.80 68.30 66.70
different_domain_same_task 88.30 85.00 72.90 77.40 76.60 81.00 79.70
different_domain_different_task 88.30 86.10 82.20 80.70 82.90 84.80 82.20

Reverse attacks

Domain Original accuracy A2TYoo2021 accuracy BAEGarg2019 accuracy DeepWordBugGao2018 accuracy PWWSRen2019 accuracy TextBuggerLi2018 accuracy TextFoolerJin2019 accuracy
similar_domain_same_task 95.00 83.20 81.30 81.90 94.80 86.80 88.30
similar_domain_different_task 73.46 73.34 69.00 71.55 71.04 71.30 71.81
different_domain_same_task 70.50 69.00 60.70 61.50 61.00 64.70 64.30
different_domain_different_task 84.69 80.86 80.86 78.31 81.25 83.03 79.71

References

About

Here we explore cross-domain transferability of adversarial examples in NLP.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages