Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Bump prrte and openpmix to latest release tags" #12335

Merged
merged 1 commit into from
Feb 15, 2024

Conversation

wenduwan
Copy link
Contributor

This reverts commit f06b1d9.

The new openpmix breaks compatibility with hwloc 1.11 in debug mode. Temporarily reverting the change.

@rhc54
Copy link
Contributor

rhc54 commented Feb 14, 2024

Just to be clear: the above statement is inaccurate. "Debug mode" simply exposed the bug - it didn't cause it. Without the --enable-devel-check, the code was silently incorrectly compiling against an ancient version of HWLOC.

FWIW: the problem change is in PMIx v4.2.7 and above, so this has been around for awhile - and undetected until we re-enabled the devel-check by default code.

@wenduwan
Copy link
Contributor Author

@rhc54 Thanks a lot for the quick fix. I don't think it's a bug in pmix but unfortunately fixing the bug caused a surprise.

I don't imagine there will be a 4.2.10 release - so I'm inclined to pin pmix to 4.2.8 for the moment.

Also I just realized that openpmix also requires min hwloc 1.11 https://docs.openpmix.org/en/latest/installing-pmix/required-support-libraries.html

@rhc54
Copy link
Contributor

rhc54 commented Feb 14, 2024

It actually is a bug in PMIx, and it goes all the way back to the v4.2.7 release. You'd have to back down to v4.2.6 to get away from it. Only reason you haven't been impacted is because you are ignoring the warnings being emitted during build - and the change in v4.2.9 forced those warnings to be treated as errors (thereby stopping the build) when building from a Git clone.

There will be no 4.2.10 release - ever. You can either default to v4.2.6 or move up to v5.0.2 (soon to be released). Or just don't care about the potential errors and release with v4.2.8 - and let your users live with any resulting problems. I'll warn folks from over here about using OMPI v5.x in that case and recommend building against an external correct PMIx instead.

@wenduwan
Copy link
Contributor Author

@rhc54 Curious is this issue also on pmix 5.0? I'm planning to test out the integration in a PR soon.

@rhc54
Copy link
Contributor

rhc54 commented Feb 15, 2024

No, I just backported the fix to it this morning 😄

@wenduwan
Copy link
Contributor Author

@janjust I opened a PR on main branch to test the latest pmix(master, not 5.0) and found issues. #12342

I think we need to spend some more time to figure out the upgrade path - for the moment I'm inclined to merge this PR and pin us to 4.2.8.

Do you have a different opinion?

@rhc54
Copy link
Contributor

rhc54 commented Feb 15, 2024

The NVIDIA CI failure is a failure in their CI during setup.

The mpi4py failure is (I believe) a known problem that @hppritcha is working on - hard for me to do anything about it when some unknown test simply says "it didn't work".

@janjust janjust merged commit 95b27d3 into open-mpi:v5.0.x Feb 15, 2024
@janjust
Copy link
Contributor

janjust commented Feb 15, 2024

revert so that CI's can pass, then we'll bump

@wenduwan wenduwan deleted the revert_submodule_bump branch February 15, 2024 21:29
@wenduwan
Copy link
Contributor Author

@rhc54 I ran #12342 through AWS internal CI and it is also broken. I need to look into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants