Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error stack: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory) #316

Closed
sagitter opened this issue Aug 3, 2023 · 4 comments

Comments

@sagitter
Copy link

sagitter commented Aug 3, 2023

Hi all again.

I don't know what is happening with MPICH-4.1.2, Sundials's tests are always failing in x86_64 architectures like reported here.
(And i don't even know if these others errors are related to that reported here)

      Start 80: test_sunlinsol_spgmr_parallel_100_2_1_50_1e-3_0
80/98 Test #80: test_sunlinsol_spgmr_parallel_100_2_1_50_1e-3_0 .....***Failed    0.50 sec
03c9b3c8ba754b1c9335e6cc38cd5087:rank3.test_sunlinsol_spgmr_parallel: Failed to get eth0 (unit 0) cpu set
03c9b3c8ba754b1c9335e6cc38cd5087:rank3.test_sunlinsol_spgmr_parallel: Failed to get eth1 (unit 1) cpu set
03c9b3c8ba754b1c9335e6cc38cd5087:rank3: PSM3 can't open nic unit: -1 (err=23)
Abort(606197647): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(66)........: MPI_Init(argc=0x7ffed119a5ec, argv=0x7ffed119a5e0) failed
MPII_Init_thread(234)....: 
MPID_Init(513)...........: 
MPIDI_OFI_init_local(604): 
create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 99975 RUNNING AT 03c9b3c8ba754b1c9335e6cc38cd5087
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Full build log attached
sundials_OFI_endpoint_errors.gz

@balos1
Copy link
Member

balos1 commented Aug 3, 2023

I don't think this is related to #312 since in the test_sunlinsol_spgmr_parallel example we do call MPI_Init at the beginning.

@balos1
Copy link
Member

balos1 commented Aug 3, 2023

This seems like an error in the MPI installation. Can you verify that it works for something other than SUNDIALS?

@balos1 balos1 added the triage label Aug 3, 2023
@sagitter
Copy link
Author

sagitter commented Aug 5, 2023

This seems like an error in the MPI installation. Can you verify that it works for something other than SUNDIALS?

Other libraries like hypre or MUMPS are tested with MPICH-4.1.2
superlu_dist fails

@balos1
Copy link
Member

balos1 commented Jun 19, 2024

Since we never could reproduce this, I am closing it. Please reopen it if its still a problem with the current sundials release.

@balos1 balos1 closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants