You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X PIX PXB PXB PXB PXB 0-27,56-83 0 N/A
GPU1 PIX X PXB PXB PXB PXB 0-27,56-83 0 N/A
GPU2 PXB PXB X PIX NV4 PIX 0-27,56-83 0 N/A
GPU3 PXB PXB PIX X PIX NV4 0-27,56-83 0 N/A
GPU4 PXB PXB NV4 PIX X PIX 0-27,56-83 0 N/A
GPU5 PXB PXB PIX NV4 PIX X 0-27,56-83 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
I just print the test_dataset information in the code, but after two hours, its still running, way much slower than one GPU running time, whats the potential reason? I just replace the Reddit dataset to Cora dataset.
The text was updated successfully, but these errors were encountered:
As I replied to you in #1417 (comment), how did you measure the communication speed? Is it just comparing the wall time of the script between single-GPU and multi-GPU case?
The interconnect configuration seems off to me. Is there a reason why some of the GPUs are interconnected with NVLink and others are with PCIe?
Since you're not yet sure if the slowdown comes from communication, you could try reducing the number of workers NeighborLoader(num_workers=...) to avoid CPU subscription, depending on how many cores your system has. The best and easiest way is to profile the script with torch.profiler and see which part of the code gets slower than the single GPU case.
🐛 Describe the bug
Versions
@akihironitta, Hi, I tried to use this link: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_sampling.py in the Cora dataset, but the communication speed is extremely slow, could you analysis the reason?, Hi, the output of
nvidia-smi topo -m
is as follows:I just print the test_dataset information in the code, but after two hours, its still running, way much slower than one GPU running time, whats the potential reason? I just replace the Reddit dataset to Cora dataset.
The text was updated successfully, but these errors were encountered: