Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach
This repository relates to our paper that describes the stacked-unsupervised federated learning (FL) approach to generalize on a cross-silo configuration for a flow-based network intrusion detection system (NIDS). The proposed approach we have looked over is a deep autoencoder in conjunction with an energy flow classifier in an ensemble learning task.
Our approach performs better than traditional local learning and naive cross-evaluation (training in one context and testing on another network data). Remarkably, the proposed approach demonstrates a sound performance in the case of non-iid data silos. Along with an informative feature in an ensemble architecture for unsupervised learning, we advise that the proposed FL-based NIDS results in a feasible approach for generalization between heterogeneous networks.
- Install the requirements to reproduce this work:
- Tested with Python 3.9.11
$ python -m venv venv
$ source venv\bin\activate
(venv) $ pip install --upgrade pip
(venv) $ pip install Cython
(venv) $ pip install -r requirements.txt
- Choose one of the experiments as the full datasets* (
run_full.sh
), reduced datasets (run_reduced.sh
), or the sampled datasets (run_sampled.sh
). For instance, running the reduced datasets:
(venv) $ chmod +x run_reduced.sh
(venv) $ ./run_reduced.sh
* the full datasets are not part of this repository, see instructions on how to download the datasets inside the full_datasets
folder.
- To simulate other federated learning strategies of aggregation, the changes must be made to
server.py
according to Flower documentation. - To remove the EFC as part of the autoencoder, remove the argument
--with-EFC
from the shellscript files. - Select between
just benign
orbenign and attack
threshold for the autoencoder. Edit the fileclient.py
, thetest_eval
assigned todistance_calc
method refers to both thresholds, and assigned to the comparison oflosses
tothreshold_benign
for the only benign case.
.
├── baselines.py --> calculate the baselines over sampled datasets
├── baselines_reduced.py --> calculate the baselines over reduced datasets
├── client.py --> the source code for federated learning clients
├── error_analysis --> data used for error analysis
├── Error Analysis.ipynb --> notebook with error analysis
├── full_datasets --> reference for downloading the full datasets
├── README.md --> this README
├── reduced_datasets --> the reduced datasets (*.csv.gz)
├── requirements.txt --> requirements of libraries and specific versions
├── run_full.sh --> to execute the proposed method over full datasets
├── run_reduced.sh --> to execute the proposed method over reduced datasets
├── run_sampled.sh --> to execute the proposed method over sampled datasets
├── sampled_datasets --> the sampled datasets (*.csv.gz)
├── server.py --> the source code for federated learning server
└── utils
├── generate_reduced_datasets.py --> generate reduced datasets
├── load_data.py --> code for loading datasets
└── model.py --> code for autoencoder
@article{10.1016/j.cose.2023.103106,
title = {Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach},
journal = {Computers \& Security},
pages = {103106},
year = {2023},
issn = {0167-4048},
doi = {https://doi.org/10.1016/j.cose.2023.103106},
url = {https://www.sciencedirect.com/science/article/pii/S0167404823000160},
author = {Gustavo {de Carvalho Bertoli} and Lourenço Alves {Pereira Junior} and Osamu Saotome and Aldri Luiz {dos Santos}},
keywords = {Network Intrusion Detection, Generalization, Unsupervised Learning, Federated Learning, Network flows}
}