FLMMS is a Docker-based federated learning framework for simulating multi-machine training.
Conda is recommended for managing the Python environment. The authors' environment is set up as follows:
conda create -n flmms python=3.9 -y
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
conda install numpy tqdm loguru GPUtil -y
In directory setup_utils
, run the following command to prepare the Docker environment:
./setup.bash
In directory FLMMS
, run the following command to launch the simulator:
python launch.py
In FLMMS/configs/config.py
, you can specify:
In the GlobalConfig
class, you can specify:
expt_name
: The name of the experiment. Default isNone
, which will make the experiment results folder named for the current time like240101114514
.data_path
: The path to the dataset. Default isdata/
.results_path
: The path to save the experiment results and node running logs. Default isresults/
.random_seed
: The random seed. Default is42
.monitor_server_log
: Whether or not to monitor the server's log in the terminal after launching the simulator. The default isTrue
. If set it toFalse
, you can monitor the server's log in theresults
folder. In addition to this, you can also monitor the logs of all clients in theresults
folder.log_level
: The log level. Default isINFO
.dataloader_workers
: The number of workers for the dataloader. Default is4
.device
: The device to use. You can specifycuda
orcpu
. Default iscpu
.cuda_device
: available whendevice
iscuda
. Default is [0, 1, 2, 3], which means server uses GPU 0 and client 1 uses GPU 1, client 2 uses GPU 2, client 3 uses GPU 3.num_client
: The number of clients. Default is3
. As the server is also a node, the total number of nodes isnum_client + 1
.data_distribution
: The distribution of the dataset. See code inFLMMS/datasets/datatool.py
for details.enable_prepare_dataset
: Whether or not to prepare the dataset. Default isTrue
. A recommended way is set it toTrue
when you change thenum_client
ordata_distribution
, and set it toFalse
if you use the samenum_client
anddata_distribution
in the next experiment.
In the ModelConfig
class, you can specify:
optimizer
: The optimizer.scheduler
: The scheduler. Default is a dict{ "name": "StepLR", "param": { "step_size": 1, "gamma": 0.5 } }
which specifies theStepLR
scheduler withstep_size=1
andgamma=0.5
.lr
: The learning rate.min_lr
: The minimum learning rate.batchsize
: The batch size.
In the ExptGroupConfig1
class, each parameter is a list of values. The simulator will run the experiment for each combination of the parameters.
e.g. When you specify:
iteration = [100, 1000, 10000]
algo = [{"name": "None", "param": {}}, {"name": "FedAvg", "param": {"K": 5}}]
The simulator will run the experiment for 6 times, with the following configurations:
iteration=100, algo=None
iteration=100, algo=FedAvg
iteration=1000, algo=None
iteration=1000, algo=FedAvg
iteration=10000, algo=None
iteration=10000, algo=FedAvg
There are 2 default classes ExptGroupConfig1
and ExptGroupConfig2
in the config.py
file. You can define your own experiment configurations.
You can specify the following parameters:
group_name
: The name of the experiment group.dataset
: The dataset to use. Choose fromMNIST
,FashionMNIST
,CIFAR10
.net
: The neural network model to use. Choose fromLeNet5
,AlexNet
.iteration
: The number of iterations.algo
: The federated learning algorithm to use. Choose from:{"name": "None", "param": {}}
{"name": "FedAvg", "param": {"K": 5}}
log_freq
: The frequency of logging.
This project was inspired by the following projects: