Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Changes for GEMS + SPATIAL #14

Merged
merged 13 commits into from
Nov 8, 2023
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ repos:
name: black-format-test

- repo: https://github.com/pycqa/flake8
rev: 4.0.1
rev: 5.0.4
hooks:
- id: flake8
args: ['--ignore=E,F403,F405,F541,F841,W', '--select=E9,F,W6', '--per-file-ignores=__init__.py:F401']
name: flake8-test
name: flake8-test
18 changes: 18 additions & 0 deletions benchmarks/gems_master_model/benchmark_amoebanet_gems_master.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
# Copyright 2023, The Ohio State University. All rights reserved.
# The MPI4DL software package is developed by the team members of
# The Ohio State University's Network-Based Computing Laboratory (NBCL),
# headed by Professor Dhabaleswar K. (DK) Panda.
#
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch
import torch.distributed as dist
import torchvision.transforms as transforms
Expand Down
27 changes: 19 additions & 8 deletions benchmarks/gems_master_model/benchmark_resnet_gems_master.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
# Copyright 2023, The Ohio State University. All rights reserved.
# The MPI4DL software package is developed by the team members of
# The Ohio State University's Network-Based Computing Laboratory (NBCL),
# headed by Professor Dhabaleswar K. (DK) Panda.
#
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import torch
import torch.distributed as dist
import torchvision.transforms as transforms
Expand All @@ -12,6 +30,7 @@
from torchgems.mp_pipeline import model_generator
from torchgems.gems_master import train_model_master
import torchgems.comm as gems_comm
from torchgems.utils import get_depth

parser_obj = parser.get_parser()
args = parser_obj.parse_args()
Expand Down Expand Up @@ -74,14 +93,6 @@ def init_processes(backend="mpi"):
ENABLE_ASYNC = True
resnet_n = 12


def get_depth(version, n):
if version == 1:
return n * 6 + 2
elif version == 2:
return n * 9 + 2


###############################################################################
mpi_comm = gems_comm.MPIComm(split_size=mp_size, ENABLE_MASTER=True)
rank = mpi_comm.rank
Expand Down
70 changes: 70 additions & 0 deletions benchmarks/gems_master_with_spatial_parallelism/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# GEMS-MASTER + SP

GEMS improves performance by efficiently utilizing memory, whereas SP is used to train high-resolution images. GEMS+SP enables training high-resolution images and enhances performance by integrating GEMS which allows training model with larger batch size than the maximum feasible batch size due to GEMS.


## Run GEMS-MASTER + SP:

#### Generic command:
```bash
$MV2_HOME/bin/mpirun_rsh --export-all -np ${np} --hostfile ${HOSTFILE} MV2_USE_GDRCOPY=0 MV2_ENABLE_AFFINITY=0 MV2_USE_CUDA=1 LD_PRELOAD=$MV2_HOME/lib/libmpi.so python ${gems_sp_model_script} --split-size ${split_size} --batch-size ${batch_size} --times ${times}

```
#### Examples

- Example to run AmoebaNet MASTER+SP model for 1024 * 1024 image size with 5 model split size(i.e. # of partitions for MP), model replication factor (η = 2) and batch size for each model replica as 1 (i.e. effective batch size (EBS) = η × BS = 2).

```bash
$MV2_HOME/bin/mpirun_rsh --export-all -np ${np} --hostfile ${HOSTFILE} MV2_USE_GDRCOPY=0 MV2_ENABLE_AFFINITY=0 MV2_USE_CUDA=1 LD_PRELOAD=$MV2_HOME/lib/libmpi.so python benchmarks/gems_master_with_spatial_parallelism/benchmark_amoebanet_gems_master_with_sp.py --split-size 5 --batch-size 1 --image-size 1024 --times 2

```
- Similarly, we can run benchmark for ResNet MASTER model.
Below is example to run ResNet MASTER+SP model for 2048 * 2048 image size with 5 model split size(i.e. # of partitions for MP), model replication factor (η = 4) and batch size for each model replica as 1 (i.e. effective batch size (EBS) = η × BS = 4).
```bash
$MV2_HOME/bin/mpirun_rsh --export-all -np $np --hostfile ${HOSTFILE} MV2_USE_GDRCOPY=0 MV2_ENABLE_AFFINITY=0 MV2_USE_CUDA=1 LD_PRELOAD=$MV2_HOME/lib/libmpi.so python benchmarks/gems_master_model/benchmark_resnet_gems_master_with_sp.py --split-size 5 --image-size 2048 --batch-size 1 --times 4

```
Below are the available configuration options :

<pre>
usage: benchmark_amoebanet_sp.py [-h] [-v] [--batch-size BATCH_SIZE] [--parts PARTS] [--split-size SPLIT_SIZE] [--num-spatial-parts NUM_SPATIAL_PARTS]
[--spatial-size SPATIAL_SIZE] [--times TIMES] [--image-size IMAGE_SIZE] [--num-epochs NUM_EPOCHS] [--num-layers NUM_LAYERS]
[--num-filters NUM_FILTERS] [--balance BALANCE] [--halo-D2] [--fused-layers FUSED_LAYERS] [--local-DP LOCAL_DP] [--slice-method SLICE_METHOD]
[--app APP] [--datapath DATAPATH]

SP-MP-DP Configuration Script

optional arguments:
-h, --help show this help message and exit
-v, --verbose Prints performance numbers or logs (default: False)
--batch-size BATCH_SIZE
input batch size (default: 32)
--parts PARTS Number of parts for MP (default: 1)
--split-size SPLIT_SIZE
Number of process for MP (default: 2)
--num-spatial-parts NUM_SPATIAL_PARTS
Number of partitions in spatial parallelism (default: 4)
--spatial-size SPATIAL_SIZE
Number splits for spatial parallelism (default: 1)
--times TIMES Number of times to repeat MASTER 1: 2 repications, 2: 4 replications (default: 1)
--image-size IMAGE_SIZE
Image size for synthetic benchmark (default: 32)
--num-epochs NUM_EPOCHS
Number of epochs (default: 1)
--num-layers NUM_LAYERS
Number of layers in amoebanet (default: 18)
--num-filters NUM_FILTERS
Number of layers in amoebanet (default: 416)
--balance BALANCE length of list equals to number of partitions and sum should be equal to num layers (default: None)
--halo-D2 Enable design2 (do halo exhange on few convs) for spatial conv. (default: False)
--fused-layers FUSED_LAYERS
When D2 design is enables for halo exchange, number of blocks to fuse in ResNet model (default: 1)
--local-DP LOCAL_DP LBANN intergration of SP with MP. MP can apply data parallelism. 1: only one GPU for a given split, 2: two gpus for a given split (uses DP)
(default: 1)
--slice-method SLICE_METHOD
Slice method (square, vertical, and horizontal) in Spatial parallelism (default: square)
--app APP Application type (1.medical, 2.cifar, and synthetic) in Spatial parallelism (default: 3)
--datapath DATAPATH local Dataset path (default: ./train)
</pre>

*Note:"--times" is GEMS specific parameter and certain parameters such as "--num-spatial-parts", "--slice-method", "--halo-D2" would not be required by GEMS.*
Loading