update hardware/queue description after upgrade

Signed-off-by: Karl W. Schulz <[email protected]>
AMDResearch · Aug 8, 2024 · f83fc33 · f83fc33
1 parent a071a4a
commit f83fc33
Show file tree

Hide file tree

Showing 2 changed files with 34 additions and 8 deletions.
diff --git a/docs/hardware.md b/docs/hardware.md
@@ -4,7 +4,32 @@ The HPC Fund Research Cloud consists of 40 high performance computing (HPC) serv
 
 ## Compute servers
 
-Each compute server consists of two [AMD EPYC&trade;](https://www.amd.com/en/processors/epyc-server-cpu-family) 7V13 64-core processors with access to 512 GB of main memory. High-speed user network connectivity for inter-node communication is accommodated by a [ConnextX-6](https://nvdam.widen.net/s/5j7xtzqfxd/connectx-6-infiniband-datasheet-1987500-r2) MT28908 Infiniband host channel adapter providing a maximum port speed of 200 Gb/s.   For accelerated analysis, each node also includes multiple [AMD MI100](https://www.amd.com/en/products/server-accelerators/instinct-mi100) GPU accelerators (in either 4 or 8 GPU/node configurations). Each individual accelerator has 32GB of high bandwidth memory (HBM) and a peak double-precision (FP64) performance of 11.5 TFLOPS interconnected via PCI Express.
+Each compute server consists of two [AMD EPYC&trade;](https://www.amd.com/en/processors/epyc-server-cpu-family) processors with access to 512 GB (or more) of main memory. High-speed user network connectivity for inter-node communication is accommodated by a [ConnextX-6](https://nvdam.widen.net/s/5j7xtzqfxd/connectx-6-infiniband-datasheet-1987500-r2) MT28908 Infiniband host channel adapter providing a maximum port speed of 200 Gb/s.   For accelerated analysis, each node also includes one or more [AMD Instinct&trade;](https://www.amd.com/en/products/accelerators/instinct.html) accelerators. Multiple generations of accelerators are available within the system with key characteristics highlighted as follows:
+<!-- * [AMD MI100 Accelerator](https://www.amd.com/en/products/accelerators/instinct/mi100.html) 
+  * Peak double-precision (FP64) performance of 11.5 TFLOPs
+  * 32 GB of high bandwidth memory (HBM2e)
+  * Peak GPU memory bandwidth 1.2 TB/s  
+  * Form factor: PCIe Add-in Card
+* [AMD MI210 Accelerator](https://www.amd.com/en/products/accelerators/instinct/mi200/mi210.html)
+  * Peak double-precision (FP64) performance of 45.3 TFLOPs
+  * 64 GB of high bandwidth memory (HBM2e)
+  * Peak GPU memory bandwidth 1.6 TB/s
+  Form factor: PCIe Add-in Card
+* [AMD MI250 Accelerator](https://www.amd.com/en/products/accelerators/instinct/mi200/mi250.html)
+  * Peak double-precision (FP64) performance of 43.3 TFLOPs (per GCD)
+  * 64 GB of high bandwidth memory (HBM2e) (per GCD)
+  * Peak GPU memory bandwidth 1.6 TB/s (per GCD)
+  * Form factor: OAM Module -->
+
+  ```{table} Table 1:  Hardware Overview of Available Node Types
+| Accelerator       | Peak FP64 | HBM Capacity | HBM Peak B/W |            Host CPU                 | Host Memory |
+| --------- | :------: | :---------: | :---------------: | :------------------------------------------: | :---: |
+| [AMD MI100](https://www.amd.com/en/products/accelerators/instinct/mi100.html)  | 11.5 TFLOPs  |   32GB |   1.2 TB/s | 2 X EPYC 7V13 64-core | 512 GB |
+| [AMD MI210](https://www.amd.com/en/products/accelerators/instinct/mi200/mi210.html)  | 45.3 TFLOPs  |   64GB |   1.6 TB/s | 2 X EPYC 7V13 64-core | 512 GB |
+| [AMD MI250](https://www.amd.com/en/products/accelerators/instinct/mi200/mi250.html)  |  45.3 TFLOPs (per GCD) |  64GB (per GCD) |   1.6 TB/s (per GCD) | 2 X EPYC 7763 64-Core | 1.5 TB |
+```
+
+Note that one AMD MI250 accelerator provides two Graphics Compute Dies (GCDs) for which the programmer can use as two separate GPUs.
 
 ## File systems
 

diff --git a/docs/jobs.md b/docs/jobs.md
@@ -2,19 +2,20 @@
 
 The HPC Fund Research Cloud runs the [SLURM](https://slurm.schedmd.com/overview.html) workload resource manager in order to organize job scheduling across the cluster. In order to access back-end compute resources, users must submit jobs to SLURM (either interactive or batch) and the underlying scheduler will manage execution of all jobs using a [multi-factor](https://slurm.schedmd.com/priority_multifactor.html) priority algorithm.
 
-Multiple partitions (or queues) are available for users to choose from and each job submission is associated with a particular partition request.  The table below summarizes available production queues and runlimits currently available:
+Multiple partitions (or queues) are available for users to choose from and each job submission is associated with a particular partition request.  Note that partition names are mostly organized around the type of accelerator hardware installed in the hosts. The table below summarizes available production queues, hardware configuration, allocation charging rates and runtime limits currently available:
 
 
 ```{table} Table 1: Available SLURM queues
 :name: table-queues
 | Queue     | Max Time | Max Node(s) | Charge Multiplier |                Configuration                 |
 | --------- | :------: | :---------: | :---------------: | :------------------------------------------: |
-| `devel`   | 30 min.  |      1      |        1X         | Targeting short development needs (4xMI100). |
-| `mi1004x` | 24 hours |     16      |        1X         |       4 x MI100 accelerators per node.       |
-| `mi1008x` | 24 hours |     10      |       1.7X        |       8 x MI100 accelerators per node.       |
+| `devel`   | 30 min.  |      1      |        1.0X       | Targeting short development needs (4xMI210). |
+| `mi1008x` | 24 hours |      5      |        0.8X       |       8 x MI100 accelerators per node.       |
+| `mi12104x`| 24 hours |     16      |        1.0X       |       4 x MI210 accelerators per node.       |
+| `mi12508x`| 12 hours |     10      |        1.7X       |       4 x MI250 accelerators (8 GPUss) per node.  |
 ```
 
-Note that special requests that extend beyond the above queue limits may potentially be accommodated on a case-by-case basis.
+Note that special requests that extend beyond the above queue limits may potentially be accommodated on a case-by-case basis. You must have an active accounting allocation in order to submit jobs and the resource manager will track the combined number of **node** hours consumed by each job and deduct the [total node hours]*[charge multiplier] from your available balance.
 
 ## Batch job submission
 
@@ -28,7 +29,7 @@ Example SLURM batch job submission scripts are available on the login node at `/
 #SBATCH -N 2                  # Total number of nodes requested
 #SBATCH -n 8                  # Total number of mpi tasks requested
 #SBATCH -t 01:30:00           # Run time (hh:mm:ss) - 1.5 hours
-#SBATCH -p mi1004x            # Desired partition
+#SBATCH -p mi2104x            # Desired partition
 
 # Launch an MPI-based executable
 
@@ -102,7 +103,7 @@ If your application is only configured for single GPU acceleration, you can stil
 #SBATCH -N 1                  # Total number of nodes requested
 #SBATCH -n 4                  # Total number of mpi tasks requested
 #SBATCH -t 01:30:00           # Run time (hh:mm:ss) - 1.5 hours
-#SBATCH -p mi1004x            # Desired partition
+#SBATCH -p mi2104x            # Desired partition
 
 binary=./hipinfo
 args=""