You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Depending on the version of MPI or ISV code being used, occasionally they try to rely on the Slurm nodenames which aren't actual resolvable hostnames. This causes the jobs to fail.
It would be good if the actual hostnames on the nodes and in Azure DNS matched the nodename used in Slurm.
The text was updated successfully, but these errors were encountered:
We are seeing this with Abaqus. It's worth noting that we are confined to running in UK South where we only have H Series available, which do not have SR-IOV support and therefore limits us to Intel MPI. When HC Series lands later this year (with SR-IOV support), we expect to be able to use the MPI that ships with Abaqus and will see if this allows multi-node jobs to run when Slurm node names do not match host names.
Depending on the version of MPI or ISV code being used, occasionally they try to rely on the Slurm nodenames which aren't actual resolvable hostnames. This causes the jobs to fail.
It would be good if the actual hostnames on the nodes and in Azure DNS matched the nodename used in Slurm.
The text was updated successfully, but these errors were encountered: