-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opal_thread integration with libevent #12725
base: main
Are you sure you want to change the base?
Conversation
Hello! The Git Commit Checker CI bot found a few problems with this PR: 40566dc: This is an initial stab at changing the Open MPI l...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Will also fix with a sign off. |
2a6e7cf
to
73ffa06
Compare
Just a question: given that PMIx is embedded in OMPI and strictly (and heavily) uses libevent, does that create a conflict? |
It's been a while since I've dug into this. Do they share a common libopen_pal? The eventual goal is direct cooperation between the event loop, ompi, pmix and openmp. If they share a common opal then everything should be transparent. Once this is working I'll be attacking the event loop next. Is the reference for this openpmix? Edit: Quick check in openpmix's case and the libevents are independent AFAICT. The relevant line is here: You'd need to do a similar integration to the opal one. |
PMIx has no connection whatsoever with OPAL. If the user is using the embedded PMIx, then we get connected to the embedded version of libevent. Otherwise, PMIx gets linked in to OMPI as a distinct library that is likewise linked to the external libevent it was built against. Issue might be that whenever you call a PMIx function, you get threadshifted into a pthread progress thread - any callback from that function executes inside the pthread progress thread. We also make extensive use of libevent directly, which may or may not result in OMPI code running inside the event until someone threadshifts out of it. Not sure what, if any, issues that might present here. |
There are issues if the libevent queues are shared. The reason I'm doing this is to restart some of the runtime integration work I'd been doing a few years ago. With openpmix it should be possible to mimic the opal approach (make the threading modular while presenting a pthreads like API). Is there anywhere I should be looking in particular? |
Not entirely sure what you might need to dig into, but the progress thread code is in src/runtime/pmix_progress_threads.[h,c]. Thread code itself is in src/threads. The libevent integration is somewhat spread around the code base, I'm afraid - pretty much just used wherever it is needed. Honestly haven't looked at it in a long time - probably mostly in the src/mca/ptl area since that handles the messaging system. |
Also, request for advice. I'm a bit confused that it's not linking in libopalutil into libmpi (and thus the new function isn't being defined). What am I missing? |
Openpmix itself looks relatively straightforward to modify. It follows the openmpi convention of factoring threads out. There would need to be another set of lock hooks using |
I don't see where you define |
Thanks, forgot to rename the function to set the structures. Force pushed again. |
This is an initial stab at changing the Open MPI libevent integration to use opal threads instead of pthreads for alternative threading libraries. Currently, although opal threads are now replaceable via an mca, opal_event_use_threads() in event.h is just a macro passthrough to libevent's pthread event locking api. This was causing event list corruption with alternative thread libraries. This commit factors out libevent threading and integrates it with opal thread api. The changes are currently untested. The initial changes are primarily to solicit feedback on my approach before I begin more extensive testing and debugging. Signed-off-by: Noah Evans <[email protected]>
The mpi4py spawn test consistently fails for this PR. It's likely related to this PR. |
@wenduwan My intuition is that there's a conflict between my new locks and PMIX. I've tried the openmpi thread tests locally and they work. Is there any way of doing the spawn test locally? |
No, sorry, I mean to replicate you guys build environment. I see mpi4py in the github workflows, I just don't know how to run it locally. |
I think my answer may be: https://github.com/actions/runner but I still want to know if you guys have a "blessed" way of testing locally. |
I see. I don't know about a best way, but for myself I mimic the runner with a simple Dockerfile which runs the same github workflow steps. Works just fine. |
I doubt there is a cross-interaction issue here. However, I suspect you might hit a problem where you lock the OMPI side while either invoking a PMIx call that does a callback to OMPI, or perhaps have OMPI execute two simultaneous PMIx blocking calls with each OMPI side being locked. We have had problems with thread deadlock in OMPI before, so it wouldn't be the first time.
Just to be clear: no, you cannot build nor run OMPI without PMIx. |
This is an initial stab at changing the Open MPI libevent integration to use opal threads instead of pthreads for alternative threading libraries.
Currently, although opal threads are now replaceable via an mca, opal_event_use_threads() in event.h is just a macro passthrough to libevent's pthread event locking api. This was causing event list corruption with alternative thread libraries.
This commit factors out libevent threading and integrates it with opal thread api. The changes are currently untested. The initial changes are primarily to solicit feedback on my approach before I begin more extensive testing and debugging.