ASTRA-Sim

What is this repository for?

This is the ASTRA-sim distributed Deep Learning Training simulator, developed in collaboration between Georgia Tech, Facebook and Intel.

An overview is presented here:

The full description of the tool and its strength can be found in the paper below:

Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, and Tushar Krishna, "ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms" In Proc of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr 2020 [pdf][slides][video]

ASTRA-SIM tutorials can be found here: https://astra-sim.github.io/

Bibtex

@inproceedings{astrasim,
    author       = {Saeed Rashidi and
                   Srinivas Sridharan and
                   Sudarshan Srinivasan and
                   Tushar Krishna},
    title        = {{ASTRA-SIM: Enabling SW/HW Co-Design Exploration for Distributed DL Training Platforms}},
    booktitle     = {{IEEE} International Symposium on Performance Analysis of Systems
                    and Software, {ISPASS} 2020, Boston, MA, USA, August 22-26, 2020},
  publisher     = {{IEEE}},
  year          = {2020},
}

Setup Instructions

# Clone the repository
$ git clone https://github.com/astra-sim/astra-sim.git

# cloning the submodules
$ cd astra-sim
$ git submodule init
$ git submodule update

Instructions for compiling & running Garnet2.0 as the network simulator

Run ./build/astra_garnet/build.sh -c to compile and integrate astra-sim with gem5 (-l flag will clean the compilation). This will create a binary file where garnet is integrated with astra-sim. The analytical backend is hosted at https://github.com/georgia-tech-synergy-lab/gem5_astra .
Run an example inside the examples/ directory with garnet as a backend. Example: examples/run_allreduce.sh -n garnet. This command will run a single all-reduce collective on a Torus topology.
The results of example script runs will be dumped inside examples/results/ path.

Instructions for compiling & running analytical backend as the network simulator

Run ./build/astra_analytical/build.sh -c to compile and integrate astra-sim with analytical backend (-l flag will clean the compilation). This will create a binary file where analytical backend is integrated with astra-sim. Please refer to this page for more details about compilation. The analytical backend is hosted at https://github.com/astra-sim/analytical .
Run an example inside the examples/ directory with the analytical model as a backend. Example: examples/run_allreduce.sh -n analytical. This command will run a single all-reduce collective on a Torus topology.
The results of example script runs will be dumped inside examples/results/ path.

Instructions for compiling & running NS3 as the network simulator

Coming Soon!

NOTE: The on-screen reported delays (no matter what backend is used) after the end of simulation are in cycles (by default each cycle is 1 nanosecond) while the delays inside the csv files are in terms of microseconds.

ASTRA-SIM Binary Command Line Options

When running the binary file (no matter what backend is used), the following options may be passed to the binary file (see example scripts):

--network-configuration (required): The network input file dir.

--system-configuration (required): The system input file dir.

--workload-configuration (required): The workload input file dir.

--path (required): The path to dump the results.

--run-name (required): Name of the current run.

--num-passes (required): Number of training passes to simulate.

--total-stat-rows (required): Total number of runs that want to write to the same csv file (please see run_multi.sh inside the "examples/"" directory). This is useful when multiple runs want to write to the same csv file. This value should be 1 if only 1 run is executed.

--stat-row (required): The position of the run to write its stats into the csv stat files (please see run_multi.sh inside the "examples/"" directory). This is useful when multiple runs want to write to the same csv file. This value should be 0 if only 1 run is executed.

--compute-scale (optional): Scales the all compute times (reported in the workload input file) by this scale. Tge default value is 1.

--comm-scale (optional): Scales the all communication sizes (reported in the workload input file) by this scale. Tge default value is 1.

NOTE: The garnet+astra-sim binary also allows all of the network input options be overridden by the command line options.

Input Files to ASTRA-sim

Workload: inputs/workload/
- see inputs/workload/README.md
- see scripts/workload_generator/README.md for instruction on how to use an automated script to generate workload input files.
System: inputs/system/
- see inputs/system/README.md
Network:
- inputs/network/garnet (for garnet backend inputs)
  - see inputs/network/garnet/README.md`
- inputs/network/analytical (for analytical backend inputs)
  - see inputs/network/analytical/README.md

Contact

Please email Saeed Rashidi ([email protected]) or Srinivas Sridharan ([email protected]) or Tushar Krishna ([email protected]) if you have any questions.

Core Developers

Saeed Rashidi (Georgia Tech)
Srinivas Sridharan (Facebook)

Additional Contributors

Jiayi Huang (University of California, Santa Barbara)
Apurve Chawde (Georgia Tech)
Santosh Kumar Elangoven (Georgia Tech)
William Won (Georgia Tech)
Tushar Krishna (Georgia Tech)
Greg Steinbrecher (Facebook)

Name		Name	Last commit message	Last commit date
Latest commit History 387 Commits
.github/workflows		.github/workflows
astra-sim		astra-sim
build		build
docs/images		docs/images
examples		examples
extern		extern
inputs		inputs
scripts/workload_generator		scripts/workload_generator
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASTRA-Sim

What is this repository for?

Setup Instructions

Instructions for compiling & running Garnet2.0 as the network simulator

Instructions for compiling & running analytical backend as the network simulator

Instructions for compiling & running NS3 as the network simulator

ASTRA-SIM Binary Command Line Options

Input Files to ASTRA-sim

Contact

Core Developers

Additional Contributors

About

Releases

Packages

Languages

License

yleintel/astra-sim

Folders and files

Latest commit

History

Repository files navigation

ASTRA-Sim

What is this repository for?

Setup Instructions

Instructions for compiling & running Garnet2.0 as the network simulator

Instructions for compiling & running analytical backend as the network simulator

Instructions for compiling & running NS3 as the network simulator

ASTRA-SIM Binary Command Line Options

Input Files to ASTRA-sim

Contact

Core Developers

Additional Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages