p2pBandwidthLatencyTest memory bandwidth low on H200 #311

ltm920716 · 2024-12-06T13:22:42Z

hi，
I run p2pBandwidthLatencyTest on my H200 server，

nvidia-smi
Fri Dec  6 20:14:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:18:00.0 Off |                    0 |
| N/A   30C    P0            115W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H200                    On  |   00000000:2A:00.0 Off |                    0 |
| N/A   32C    P0            115W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H200                    On  |   00000000:3A:00.0 Off |                    0 |
| N/A   34C    P0            118W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H200                    On  |   00000000:5D:00.0 Off |                    0 |
| N/A   30C    P0            111W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H200                    On  |   00000000:9A:00.0 Off |                    0 |
| N/A   31C    P0            112W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H200                    On  |   00000000:AB:00.0 Off |                    0 |
| N/A   34C    P0            114W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H200                    On  |   00000000:BA:00.0 Off |                    0 |
| N/A   32C    P0            117W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H200                    On  |   00000000:DB:00.0 Off |                    0 |
| N/A   31C    P0            114W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

 nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

./p2pBandwidthLatencyTest
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7
     0 3021.15  37.33  36.93  37.59  37.57  37.61  37.59  36.63
     1  37.31 3068.43  37.72  37.63  37.05  36.87  37.30  37.53
     2  36.86  37.04 3064.10  37.59  37.34  36.97  37.92  37.41
     3  37.12  37.70  37.01 3051.94  37.55  37.62  37.83  37.17
     4  36.99  36.89  36.92  36.87 3062.22  36.61  37.09  36.69
     5  37.25  37.12  37.30  37.11  36.53 3068.05  36.90  37.20
     6  36.97  37.44  37.36  36.90  37.05  36.97 3056.98  37.58
     7  36.25  37.37  36.73  37.55  36.67  36.79  37.59 3064.29
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7
     0 3039.51 363.41 374.74 375.56 375.22 374.96 373.70 375.08
     1 375.16 3084.52 375.14 375.45 361.05 375.76 375.84 375.08
     2 361.90 392.00 3065.98 374.78 375.43 375.94 375.41 375.59
     3 361.31 375.03 392.98 3057.36 375.36 375.08 374.44 374.87
     4 360.82 375.13 375.99 392.56 3068.24 374.95 374.98 375.23
     5 375.15 375.52 376.02 375.78 373.37 3074.46 374.85 375.48
     6 374.99 374.96 375.42 374.32 374.98 375.26 3081.28 375.43
     7 374.12 376.08 375.28 375.90 376.36 376.05 375.09 3059.79
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7
     0 3143.96  52.20  52.01  52.52  52.17  52.50  52.15  52.23
     1  52.63 3160.66  52.99  52.88  52.34  51.85  52.89  51.54
     2  52.18  51.96 3157.46  52.63  52.29  51.53  52.56  52.19
     3  52.71  52.91  52.60 3149.11  52.12  52.17  52.26  52.10
     4  52.34  52.37  52.64  51.93 3155.67  43.86  44.30  44.58
     5  52.47  52.01  51.86  51.99  44.54 3160.46  44.55  44.17
     6  52.15  52.43  52.68  51.79  45.02  44.93 3163.56  44.44
     7  52.07  52.30  52.44  51.60  44.85  44.43  44.75 3157.06

the gpu memory bandwith is 3143.96 GB/s, 3143.96 / 4.8TB/s = 65%， and the common effective ratio is about 75%？
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#effective-bandwidth-calculation

I need help，how could I improve the test bandwith? or something wrong？

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2pBandwidthLatencyTest memory bandwidth low on H200 #311

p2pBandwidthLatencyTest memory bandwidth low on H200 #311

ltm920716 commented Dec 6, 2024

p2pBandwidthLatencyTest memory bandwidth low on H200 #311

p2pBandwidthLatencyTest memory bandwidth low on H200 #311

Comments

ltm920716 commented Dec 6, 2024