Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI combined with RDMA. #12999

Open
xiaojiesi opened this issue Dec 25, 2024 · 1 comment
Open

OpenMPI combined with RDMA. #12999

xiaojiesi opened this issue Dec 25, 2024 · 1 comment

Comments

@xiaojiesi
Copy link

Please submit all the information below so that we can understand the working environment that is the context for your question.

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Package: Open MPI root@sharp-ci-02 Distribution
Open MPI: 4.1.5rc2
Open MPI repo revision: v4.1.5rc1-16-g5980bac633

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git克隆的

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version:
  • Computer hardware:
  • Network type:
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 36
    On-line CPU(s) list: 0-35
    Thread(s) per core: 1
    Core(s) per socket: 18
    Socket(s): 2
    NUMA node(s): 2
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 85
    Model name: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
    Stepping: 7
    CPU MHz: 999.914
    CPU max MHz: 3900.0000
    CPU min MHz: 1000.0000
    BogoMIPS: 5200.00
    Virtualization: VT-x
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 1024K
    L3 cache: 25344K
    NUMA node0 CPU(s): 0-17
    NUMA node1 CPU(s): 18-35

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Currently, I have a computer cluster. I want to conduct communication by combining OpenMPI with RDMA. Here, I have configured UCX to support OpenMPI and set the communication modes of UCX as RC (Reliable Connected) and UD (Unreliable Datagram). I also set UCX_NET_DEVICES = mlx5_0. At present, local RDMA communication has been achieved. However, when I configure the host file and try to implement cross-node communication combining OpenMPI with RDMA, error messages will be reported.

Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:

shell$  mpirun -np 4 --mca pml ucx --hostfile host ./mpi_rdma_test
@jsquyres
Copy link
Member

I'm sorry, but you did not provide enough information to help you.

Please read https://docs.open-mpi.org/en/main/getting-help.html#for-problems-launching-mpi-or-openshmem-applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants