Skip to content

FBGEMM version mismatch on ARM #304

Closed
@ayanchak1508

Description

@ayanchak1508

I was trying to run the DLRMv2 benchmark of MLPerf Inference on an ARM server using the instructions here.

I run into the issue when the tool tries to install torchrec==0.3.2
torchrec==0.3.2 requires fbgemm-gpu==0.3.2 but fbgemm-gpu only introduced support for ARM starting from v0.5.0: https://download.pytorch.org/whl/cpu/fbgemm-gpu/

I tried two alternate approaches:

  1. Build fbgemm-gpu v0.3.2 from source. This does not work because it needs a compiler with AVX-512 support (which is clearly absent on ARM).
  2. Try with a newer version of fbgemm-gpu (v0.5.0 or above) but the cm tool remains inflexible and keeps trying to search for v0.3.2

Previously, I did run the benchmark without any problems on ARM (without using the cm tool) using newer versions of fbgemm-gpu. (Note that I did need to use fbgemm-gpu-cpu too)

Command to reproduce the issue:

cm run script --tags=run-mlperf,inference,_r4.1-dev    --model=dlrm-v2-99.9    --implementation=reference    --framework=pytorch    --category=datacenter    --scenario=Server   --server_target_qps=10    --execution_mode=valid    --device=cpu    --quiet --repro

Error message:

ERROR: Could not find a version that satisfies the requirement fbgemm-gpu==0.3.2 (from versions: none)
ERROR: No matching distribution found for fbgemm-gpu==0.3.2

The repro folder and the logfile is present in the attached tarball.
cm-repro.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions