Skip to content

Commit

Permalink
addition of MI300 availability; fix warnings for two cross-references
Browse files Browse the repository at this point in the history
Signed-off-by: Karl W. Schulz <[email protected]>
  • Loading branch information
koomie committed Jan 9, 2025
1 parent 6fec89c commit bde043e
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Each compute server consists of two [AMD EPYC&trade;](https://www.amd.com/en/pro
| [AMD MI100](https://www.amd.com/en/products/accelerators/instinct/mi100.html) | 11.5 TFLOPs | 32GB | 1.2 TB/s | 2 X EPYC 7V13 64-core | 512 GB |
| [AMD MI210](https://www.amd.com/en/products/accelerators/instinct/mi200/mi210.html) | 45.3 TFLOPs | 64GB | 1.6 TB/s | 2 X EPYC 7V13 64-core | 512 GB |
| [AMD MI250](https://www.amd.com/en/products/accelerators/instinct/mi200/mi250.html) | 45.3 TFLOPs (per GCD) | 64GB (per GCD) | 1.6 TB/s (per GCD) | 2 X EPYC 7763 64-Core | 1.5 TB |
| [AMD MI300X](https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html) | 81.7 TFLOPs | 192GB | 5.3 TB/s (per GCD) | 2 X EPYC 9684X 96-Core | 2.3 TB |
```

Note that one AMD MI250 accelerator provides two Graphics Compute Dies (GCDs) for which the programmer can use as two separate GPUs.
Expand Down
20 changes: 20 additions & 0 deletions docs/jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,29 @@ Multiple partitions (or queues) are available for users to choose from and each
| `mi1008x` | 24 hours | 5 | 0.8X | 8 x MI100 accelerators per node. |
| `mi2104x` | 24 hours | 16 | 1.0X | 4 x MI210 accelerators per node. |
| `mi2508x` | 12 hours | 10 | 1.7X | 4 x MI250 accelerators (8 GPUs) per node. |
| `mi3008x` | 4 hours | 1 | 2.0X | 8 x MI300X accelerators per node. |
| `mi3008x_long` | 8 hours | 1 | 2.0X | 8 x MI300X accelerators per node. |
```

Note that special requests that extend beyond the above queue limits may potentially be accommodated on a case-by-case basis. You must have an active accounting allocation in order to submit jobs and the resource manager will track the combined number of **node** hours consumed by each job and deduct the [total node hours]*[charge multiplier] from your available balance.


## Offload Architecture Options

Since multiple generations of Instinct&trade; accelerators are available across the cluster, users building their own [HIP](https://rocm.docs.amd.com/projects/HIP/en/latest/) applications should include the correct target offload architecture during compilation based on the desired GPU type. The following table highlights the offload architecture types and compilation option that maps to available SLURM partitions.

```{table} Table 2: Offload architecture settings for local HIP compilation
:widths: 25 25 50
Partition Name | GPU Type | ROCm Offload Architecture Compile Flag
---------------|-----------|-----------------------
devel | MI210 x 4 | `--offload-arch=gfx90a`
mi2104x | MI210 x 4 | `--offload-arch=gfx90a`
mi2508x | MI250 x 8 | `--offload-arch=gfx90a`
mi3008x | MI300 x 8 | `--offload-arch=gfx942`
mi3008x_long | MI300 x 8 | `--offload-arch=gfx942`
mi1008x | MI100 x 8 | `--offload-arch=gfx908`
```

## Batch job submission

Example SLURM batch job submission scripts are available on the login node at `/opt/ohpc/pub/examples/slurm`. A basic starting job for MPI-based applications is available in this directory named `job.mpi` and is shown below for reference:
Expand Down Expand Up @@ -162,6 +181,7 @@ The table below highlights several of the more common user-facing SLURM commands
| scontrol | view or modify a job configuration |
```

(jupyter)=
## Jupyter

Users can run Jupyter Notebooks on the HPC Fund compute nodes by making a copy
Expand Down
1 change: 1 addition & 0 deletions docs/software.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The Lmod system provides a flexible mechanism to manage your local software envi
The `module help` command can also be run locally on the system to get more information on available Lmod options and sub-commands.
```

(python-environment)=
## Python Environment

A base Python installation is available on the HPC Fund cluster which includes a handful of common packages (e.g., `numpy`, `pandas`). If additional packages are needed, users can customize their environments by installing packages with a user install, creating a Python virtual environment to install packages in, or loading a module for a specific package (e.g., `pytorch`, `tensorflow`). Examples of each method are given below.
Expand Down

0 comments on commit bde043e

Please sign in to comment.