Closed
Description
I was trying to run the DLRMv2 benchmark of MLPerf Inference on an ARM server using the instructions here.
I run into the issue when the tool tries to install torchrec==0.3.2
torchrec==0.3.2
requires fbgemm-gpu==0.3.2
but fbgemm-gpu
only introduced support for ARM starting from v0.5.0: https://download.pytorch.org/whl/cpu/fbgemm-gpu/
I tried two alternate approaches:
- Build fbgemm-gpu v0.3.2 from source. This does not work because it needs a compiler with AVX-512 support (which is clearly absent on ARM).
- Try with a newer version of
fbgemm-gpu
(v0.5.0 or above) but thecm
tool remains inflexible and keeps trying to search for v0.3.2
Previously, I did run the benchmark without any problems on ARM (without using the cm
tool) using newer versions of fbgemm-gpu
. (Note that I did need to use fbgemm-gpu-cpu
too)
Command to reproduce the issue:
cm run script --tags=run-mlperf,inference,_r4.1-dev --model=dlrm-v2-99.9 --implementation=reference --framework=pytorch --category=datacenter --scenario=Server --server_target_qps=10 --execution_mode=valid --device=cpu --quiet --repro
Error message:
ERROR: Could not find a version that satisfies the requirement fbgemm-gpu==0.3.2 (from versions: none)
ERROR: No matching distribution found for fbgemm-gpu==0.3.2
The repro folder and the logfile is present in the attached tarball.
cm-repro.tar.gz
Metadata
Metadata
Assignees
Labels
No labels