We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hi, I run p2pBandwidthLatencyTest on my H200 server,
nvidia-smi Fri Dec 6 20:14:45 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H200 On | 00000000:18:00.0 Off | 0 | | N/A 30C P0 115W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H200 On | 00000000:2A:00.0 Off | 0 | | N/A 32C P0 115W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA H200 On | 00000000:3A:00.0 Off | 0 | | N/A 34C P0 118W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA H200 On | 00000000:5D:00.0 Off | 0 | | N/A 30C P0 111W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA H200 On | 00000000:9A:00.0 Off | 0 | | N/A 31C P0 112W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA H200 On | 00000000:AB:00.0 Off | 0 | | N/A 34C P0 114W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA H200 On | 00000000:BA:00.0 Off | 0 | | N/A 32C P0 117W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA H200 On | 00000000:DB:00.0 Off | 0 | | N/A 31C P0 114W / 700W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found |
nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0
./p2pBandwidthLatencyTest Unidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 3 4 5 6 7 0 3021.15 37.33 36.93 37.59 37.57 37.61 37.59 36.63 1 37.31 3068.43 37.72 37.63 37.05 36.87 37.30 37.53 2 36.86 37.04 3064.10 37.59 37.34 36.97 37.92 37.41 3 37.12 37.70 37.01 3051.94 37.55 37.62 37.83 37.17 4 36.99 36.89 36.92 36.87 3062.22 36.61 37.09 36.69 5 37.25 37.12 37.30 37.11 36.53 3068.05 36.90 37.20 6 36.97 37.44 37.36 36.90 37.05 36.97 3056.98 37.58 7 36.25 37.37 36.73 37.55 36.67 36.79 37.59 3064.29 Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s) D\D 0 1 2 3 4 5 6 7 0 3039.51 363.41 374.74 375.56 375.22 374.96 373.70 375.08 1 375.16 3084.52 375.14 375.45 361.05 375.76 375.84 375.08 2 361.90 392.00 3065.98 374.78 375.43 375.94 375.41 375.59 3 361.31 375.03 392.98 3057.36 375.36 375.08 374.44 374.87 4 360.82 375.13 375.99 392.56 3068.24 374.95 374.98 375.23 5 375.15 375.52 376.02 375.78 373.37 3074.46 374.85 375.48 6 374.99 374.96 375.42 374.32 374.98 375.26 3081.28 375.43 7 374.12 376.08 375.28 375.90 376.36 376.05 375.09 3059.79 Bidirectional P2P=Disabled Bandwidth Matrix (GB/s) D\D 0 1 2 3 4 5 6 7 0 3143.96 52.20 52.01 52.52 52.17 52.50 52.15 52.23 1 52.63 3160.66 52.99 52.88 52.34 51.85 52.89 51.54 2 52.18 51.96 3157.46 52.63 52.29 51.53 52.56 52.19 3 52.71 52.91 52.60 3149.11 52.12 52.17 52.26 52.10 4 52.34 52.37 52.64 51.93 3155.67 43.86 44.30 44.58 5 52.47 52.01 51.86 51.99 44.54 3160.46 44.55 44.17 6 52.15 52.43 52.68 51.79 45.02 44.93 3163.56 44.44 7 52.07 52.30 52.44 51.60 44.85 44.43 44.75 3157.06
the gpu memory bandwith is 3143.96 GB/s, 3143.96 / 4.8TB/s = 65%, and the common effective ratio is about 75%? https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#effective-bandwidth-calculation
I need help,how could I improve the test bandwith? or something wrong?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
hi,
I run p2pBandwidthLatencyTest on my H200 server,
the gpu memory bandwith is 3143.96 GB/s, 3143.96 / 4.8TB/s = 65%, and the common effective ratio is about 75%?
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#effective-bandwidth-calculation
I need help,how could I improve the test bandwith? or something wrong?
The text was updated successfully, but these errors were encountered: