This provides instructions on reproducing experiments and results from our paper.
$ git clone https://github.com/evandowning/deepreflect.git
$ cd deepreflect/
$ git checkout tags/v0.0.1
# "ACFG" are features inspired by "ACFG" features. Since they are not exactly ACFG features, we call them "ABB" features in our paper, but for simplicity we call them "ACFG" features in our code. This will be corrected in a future version of our code.
# "ACFG Plus" are features used by DeepReflect. See paper for details.
# For autoencoders, see ../README.md
# For VGG19
$ time python model.py --shap True --kernel 4 --strides 1 acfg \
--train ./models/malicious_plus_benign_joint/train.txt \
--test ./models/malicious_plus_benign_joint/test.txt \
--valid ./models/malicious_plus_benign_joint/valid.txt \
--model ./models/malicious_plus_benign_joint/vgg19_half_joint.h5 \
--map ./models/malicious_plus_benign_joint/final_map.txt \
--vgg19-half True &> ./models/final_binaries_unipacker_bndb_acfg/vgg19_output.txt
# Get SHAP highlights
$ time python explain_shap.py acfg --train ./models/malicious_plus_benign_joint/train.txt \
--test ./models/malicious_plus_benign_joint/test.txt \
--valid ./models/malicious_plus_benign_joint/valid.txt \
--data test \
--model ./models/malicious_plus_benign_joint/vgg19_half_joint.h5 \
--map ./models/malicious_plus_benign_joint/final_map.txt \
--joint True \
--output ./shap/ 2> error.txt
See README.md
(dr) $ python cluster_select.py --split 5 \
--num 10 \
--input pca_hdbscan_output.txt \
--output cluster_select_output.txt
(dr) $ time python find_singleton.py --input pca_hdbscan_output.txt > find_singleton_stdout.txt
(dr) $ cd emerging-threat/
# I assume RoIs have already been extracted
(dr) $ cp -r ../autoencoder_roi/ .
(dr) $ time ./run.sh &> run_output.txt
- Ground-truth ROC curves
(dr) $ time ./rbot.sh &> rbot_final_stdout_stderr.txt (dr) $ time ./pegasus.sh &> pegasus_final_stdout_stderr.txt (dr) $ time ./carbanak.sh &> carbanak_final_stdout_stderr.txt (dr) $ cd ./grader/ (dr) $ time ./rbot.sh &> rbot_final_stdout_stderr.txt (dr) $ time ./pegasus.sh &> pegasus_final_stdout_stderr.txt (dr) $ time ./carbanak.sh &> carbanak_final_stdout_stderr.txt (dr) $ time ./combined.sh &> combined_final_stdout_stderr.txt
- Identify ideal threshold to use from ground-truth samples
- See
combined_final_stdout_stderr.txt
- See
- Cluster diversity
(dr) $ python cluster_contents.py --input pca_hdbscan_output.txt \ --output cluster_contents.png
- Function highlight percentage
(dr) $ time python function_coverage.py --functions /data/malicious_unipacker_bndb_function/ \ --fn autoencoder_roi/train_fn.npy \ --addr autoencoder_roi/train_addr.npy > function_coverage_stdout.txt
- Distribution of cluster sizes
(dr) $ python cluster_distribution.py --input pca_hdbscan_output.txt \ --output cluster_distribution.png
- See
sorting/
folder in dataset
- See
malware-gt/
folder in dataset
- See
malware-gt/
folder in dataset