Skip to content

dzy22/calculon

 
 

Repository files navigation

DOI

Calculon - Co-design for large scale parallel applications

Running

Run Calculon like this:

$> PYTHONPATH=. ./bin/ <args>

Calculon is a hierarchical command line. To see the commands it accepts, use --help or -h:

$> PYTHONPATH=. ./bin/ -h

You can also see how to use any command specifically by using --help or -h on the command:

$> PYTHONPATH=. ./bin/ llm -h

LLM Example

Run a single calculation for LLM (~1 sec):

$> PYTHONPATH=. ./bin/ llm models/megatron-1T.json examples/3072_t4_p64_d12_mbs4_full.json systems/a100_80g.json -

Run a system execution optimizer for LLM (~1 min):

$> PYTHONPATH=. ./bin/ llm-optimal-execution models/turing-530B.json 5128 2520 float16 systems/a100_80g.json output.json -m

opt_exe.json will contain the optimal way to run Turing-530B across 5128 A100 GPUs.

To store results from all successful runs from the same experiment, run a special system optimizer (~1 min):

$> PYTHONPATH=. ./bin/ llm-all-executions models/turing-530B.json 5128 2520 float16 systems/a100_80g.json all_output.csv

Testing and validation (optional)

To make sure that the current build is working, use

$> make test

To validate Calculon performance modeling against Megatron run on NVIDIA's Selene A100-based supercomputer with results published in "Sequence parallelism" paper, use

$> PYTHONPATH=. ./bin/calculon llm-validation

Publications

  • Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models
    Mikhail Isaev, Nic McDonald, Larry Dennison, Richard Vuduc
    Paper

  • Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training
    Mikhail Isaev, Nic McDonald, Richard Vuduc
    Paper

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.7%
  • Other 1.3%