This is the implementation of Adversarial Neural Pruning with Latent Vulnerability Suppression.
Authors: Divyam Madaan, Jinwoo Shin, Sung Ju Hwang
Despite the remarkable performance of deep neural networks on various computer vision tasks, they are known to be susceptible to adversarial perturbations, which makes it challenging to deploy them in real-world safety-critical applications. In this paper, we conjecture that the leading cause of adversarial vulnerability is the distortion in the latent feature space, and provide methods to suppress them effectively. Explicitly, we define vulnerability for each latent feature and then propose a new loss for adversarial learning, Vulnerability Suppression (VS) loss, that aims to minimize the feature-level vulnerability during training. We further propose a Bayesian framework to prune features with high vulnerability to reduce both vulnerability and loss on adversarial samples. We validate our Adversarial Neural Pruning with Vulnerability Suppression (ANP-VS) method on multiple benchmark datasets, on which it not only obtains state-of-the-art adversarial robustness but also improves the performance on clean examples, using only a fraction of the parameters used by the full network. Further qualitative analysis suggests that the improvements come from the suppression of feature-level vulnerability.
Contribution of this work
- We hypothesize that the distortion in the latent features as the leading cause of DNN's susceptibility to adversarial attacks and formally describe the concepts of the vulnerability of latent-features based on the expectation of the distortion of latent-features with respect to input perturbations.
- Based on this finding, we introduce a novel defense mechanism, Adversarial Neural Pruning with Vulnerability Suppression (ANP-VS), to mitigate the feature-level vulnerability. The resulting framework learns a Bayesian pruning (dropout) mask to prune out the vulnerable features while preserving the robust ones by minimizing the adversarial and {Vulnerability Suppression (VS) loss.
- We experimentally validate our proposed method on MNIST, CIFAR-10, and CIFAR-100 datasets, on which it achieves state-of-the-art robustness with a substantial reduction in memory and computation, with qualitative analysis which suggests that the improvement on robustness comes from its suppression of feature-level vulnerability
- Python 3.5
- Tensorflow 1.14.0
- CUDA 10.0
- cudnn 7.6.5
- Train base model
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode base --data mnist
- Train ANP-VS
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode bbd --eps=0.3 --step_size=0.01 --adv_train=True --pgd_steps=20 --vulnerability=True --data mnist --n_epochs 200 --beta_weight=4 --lambda_weight=0.001
- Train base model
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode base --data cifar10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode base --data cifar100
- Train ANP-VS
# CIFAR-10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eps=0.03 --step_size=0.007 --adv_train=True --pgd_steps=10 --vulnerability=True --data cifar10 --n_epochs 200 --beta_weight=2 --lambda_weight=0.0001
# CIFAR-100
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eps=0.03 --step_size=0.007 --adv_train=True --pgd_steps=10 --vulnerability=True --data cifar100 --n_epochs 200 --beta_weight=1 --lambda_weight=0.1
# MNIST
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode bbd --eval_mode attack --white_box=True --attack_source base --eps 0.3 --step_size 0.01 --pgd_steps=40 --vulnerability_type=expected --data mnist --adv_train=True
# CIFAR-10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode attack --white_box=True --attack_source base --eps 0.03 --step_size 0.007 --pgd_steps=40 --vulnerability_type=expected --data cifar10 --adv_train=True
# CIFAR-100
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode attack --white_box=True --attack_source base --eps 0.03 --step_size 0.007 --pgd_steps=40 --vulnerability_type=expected --data cifar100 --adv_train=True
For black-box attack, train a base adversarial training model and set --white_box=False
# MNIST
$ python -m ANP_VS.src.experiments.run --net lenet_conv --mode bbd --eval_mode test --data mnist --adv_train=True
# CIFAR-10
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode test --data cifar10 --adv_train=True
# CIFAR-100
$ python -m ANP_VS.src.experiments.run --net vgg16 --mode bbd --eval_mode test --data cifar100 --adv_train=True
The results in the main paper (average over five independent runs). The best results of adversarial baselines are highlighted in bold. 🡑(🡓) indicates that the higher (lower) number is the better.
Model | Clean Acc. (🡑) | White Box Adversarial acc. (🡑) | Black Box Adversarial acc. (🡑) | White Box Vulnerability (🡓) | Black Box Vulnerability (🡓) | Memory (🡓) | Flops (🡑) | Sparsity (🡑) |
---|---|---|---|---|---|---|---|---|
Standard | 99.29 | 0.00 | 8.02 | 0.129 | 0.113 | 100.0 | 1.00 | 0.00 |
BP | 99.34 | 0.00 | 12.99 | 0.091 | 0.078 | 4.14 | 9.68 | 83.48 |
AT | 99.14 | 88.03 | 94.18 | 0.045 | 0.040 | 100.0 | 1.00 | 0.00 |
AT BNN | 99.16 | 88.44 | 94.87 | 0.364 | 0.199 | 200.0 | 0.50 | 0.00 |
Pretrained AT | 99.18 | 88.26 | 94.49 | 0.412 | 0.381 | 100.0 | 1.00 | 0.00 |
ADMM | 99.01 | 88.47 | 94.61 | 0.041 | 0.038 | 100.0 | 1.00 | 80.00 |
TRADES | 99.07 | 89.67 | 95.04 | 0.037 | 0.033 | 100.0 | 1.00 | 0.00 |
ANP_VS | 99.05 | 91.31 | 95.43 | 0.017 | 0.015 | 6.81 | 10.57 | 84.16 |
Model | Clean Acc. (🡑) | White Box Adversarial acc. (🡑) | Black Box Adversarial acc. (🡑) | White Box Vulnerability (🡓) | Black Box Vulnerability (🡓) | Memory (🡓) | Flops (🡑) | Sparsity (🡑) |
---|---|---|---|---|---|---|---|---|
Standard | 92.76 | 13.79 | 41.65 | 0.077 | 0.065 | 100.0 | 1.00 | 0.00 |
BP | 92.91 | 14.30 | 42.88 | 0.037 | 0.033 | 12.41 | 2.34 | 75.92 |
AT | 87.50 | 49.85 | 63.70 | 0.050 | 0.047 | 100.0 | 1.00 | 0.00 |
AT BNN | 86.69 | 51.87 | 64.92 | 0.267 | 0.238 | 200.0 | 0.50 | 0.00 |
Pretrained AT | 87.50 | 52.25 | 66.10 | 0.041 | 0.036 | 100.0 | 1.00 | 0.00 |
ADMM | 78.15 | 47.37 | 62.15 | 0.034 | 0.030 | 100.0 | 1.00 | 75.00 |
TRADES | 80.33 | 52.08 | 64.80 | 0.045 | 0.042 | 100.0 | 1.00 | 0.00 |
ANP-VS | 88.18 | 56.21 | 71.44 | 0.019 | 0.016 | 12.27 | 2.41 | 76.53 |
Model | Clean Acc. (🡑) | White Box Adversarial acc. (🡑) | Black Box Adversarial acc. (🡑) | White Box Vulnerability (🡓) | Black Box Vulnerability (🡓) | Memory (🡓) | Flops (🡑) | Sparsity (🡑) |
---|---|---|---|---|---|---|---|---|
Standard | 67.44 | 2.81 | 14.94 | 0.143 | 0.119 | 100.0 | 1.00 | 0.00 |
BP | 69.40 | 3.12 | 16.39 | 0.067 | 0.059 | 18.59 | 1.95 | 63.48 |
AT | 57.79 | 19.07 | 32.47 | 0.079 | 0.071 | 100.0 | 1.00 | 0.00 |
AT BNN | 53.75 | 19.40 | 30.38 | 0.446 | 0.385 | 200.0 | 0.50 | 0.00 |
Pretrained AT | 57.14 | 19.86 | 35.42 | 0.071 | 0.065 | 100.0 | 1.00 | 0.00 |
ADMM | 52.52 | 19.65 | 31.30 | 0.060 | 0.056 | 100.0 | 1.00 | 75.00 |
TRADES | 56.70 | 21.21 | 32.81 | 0.065 | 0.060 | 100.0 | 1.00 | 0.00 |
ANP-VS | 59.15 | 22.35 | 37.01 | 0.035 | 0.030 | 16.74 | 2.02 | 66.80 |
We'd love to accept your contributions to this project. Please feel free to open an issue, or submit a pull request as necessary. If you have implementations of this repository in other ML frameworks, please reach out so we may highlight them here.
The code is build upon OpenXAIProject/Variational_Dropouts
If you found the provided code useful, please cite our work.
@inproceedings{
madaan2020adversarial,
title={Adversarial Neural Pruning with Latent Vulnerability Suppression},
author={Divyam Madaan and Jinwoo Shin and Sung Ju Hwang},
booktitle={ICML},
year={2020}
}