Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2pBandwidthLatencyTest memory bandwidth low on H200 #311

Open
ltm920716 opened this issue Dec 6, 2024 · 0 comments
Open

p2pBandwidthLatencyTest memory bandwidth low on H200 #311

ltm920716 opened this issue Dec 6, 2024 · 0 comments

Comments

@ltm920716
Copy link

hi,
I run p2pBandwidthLatencyTest on my H200 server,

nvidia-smi
Fri Dec  6 20:14:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:18:00.0 Off |                    0 |
| N/A   30C    P0            115W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H200                    On  |   00000000:2A:00.0 Off |                    0 |
| N/A   32C    P0            115W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H200                    On  |   00000000:3A:00.0 Off |                    0 |
| N/A   34C    P0            118W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H200                    On  |   00000000:5D:00.0 Off |                    0 |
| N/A   30C    P0            111W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H200                    On  |   00000000:9A:00.0 Off |                    0 |
| N/A   31C    P0            112W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H200                    On  |   00000000:AB:00.0 Off |                    0 |
| N/A   34C    P0            114W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H200                    On  |   00000000:BA:00.0 Off |                    0 |
| N/A   32C    P0            117W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H200                    On  |   00000000:DB:00.0 Off |                    0 |
| N/A   31C    P0            114W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
 nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
./p2pBandwidthLatencyTest
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7
     0 3021.15  37.33  36.93  37.59  37.57  37.61  37.59  36.63
     1  37.31 3068.43  37.72  37.63  37.05  36.87  37.30  37.53
     2  36.86  37.04 3064.10  37.59  37.34  36.97  37.92  37.41
     3  37.12  37.70  37.01 3051.94  37.55  37.62  37.83  37.17
     4  36.99  36.89  36.92  36.87 3062.22  36.61  37.09  36.69
     5  37.25  37.12  37.30  37.11  36.53 3068.05  36.90  37.20
     6  36.97  37.44  37.36  36.90  37.05  36.97 3056.98  37.58
     7  36.25  37.37  36.73  37.55  36.67  36.79  37.59 3064.29
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7
     0 3039.51 363.41 374.74 375.56 375.22 374.96 373.70 375.08
     1 375.16 3084.52 375.14 375.45 361.05 375.76 375.84 375.08
     2 361.90 392.00 3065.98 374.78 375.43 375.94 375.41 375.59
     3 361.31 375.03 392.98 3057.36 375.36 375.08 374.44 374.87
     4 360.82 375.13 375.99 392.56 3068.24 374.95 374.98 375.23
     5 375.15 375.52 376.02 375.78 373.37 3074.46 374.85 375.48
     6 374.99 374.96 375.42 374.32 374.98 375.26 3081.28 375.43
     7 374.12 376.08 375.28 375.90 376.36 376.05 375.09 3059.79
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
   D\D     0      1      2      3      4      5      6      7
     0 3143.96  52.20  52.01  52.52  52.17  52.50  52.15  52.23
     1  52.63 3160.66  52.99  52.88  52.34  51.85  52.89  51.54
     2  52.18  51.96 3157.46  52.63  52.29  51.53  52.56  52.19
     3  52.71  52.91  52.60 3149.11  52.12  52.17  52.26  52.10
     4  52.34  52.37  52.64  51.93 3155.67  43.86  44.30  44.58
     5  52.47  52.01  51.86  51.99  44.54 3160.46  44.55  44.17
     6  52.15  52.43  52.68  51.79  45.02  44.93 3163.56  44.44
     7  52.07  52.30  52.44  51.60  44.85  44.43  44.75 3157.06

the gpu memory bandwith is 3143.96 GB/s, 3143.96 / 4.8TB/s = 65%, and the common effective ratio is about 75%?
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#effective-bandwidth-calculation

I need help,how could I improve the test bandwith? or something wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant