-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20230131
Geoffrey Paulsen edited this page Feb 7, 2023
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoffrey Paulsen (IBM)
- Jeff Squyres (Cisco)
- Brendan Cunningham (Cornelis Networks)
- David Bernholdt
- Edgar Gabriel (AMD)
- Howard Pritchard (LANL)
- Joseph Schuchart (UTK)
- Josh Fisher (Cornelis Networks)
- Josh Hursey (IBM)
- Luke Robison (Amazon)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- William Zhang (AWS)
-
Reminder: When issues with various company CI controls, please post in #general Slack.
-
New - Issue #11347 Versioning is wrong in v5.0.x
- We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
- Compile an MPI Application with v4.0.x, then RM -Rf OMPI, and then install the v5.0.0 into the same location, and it just work.
- Did we figure out the Fortran ABI break?
- Memory: Yes we did break Fortran ABI.
- Broke ABI in a very narrow case, when you compile Fortran with 8byte ints, and C 4byte int.
- Two other things that may or maynot break ABI.
- Did some stuff with intents and asyncs, and went from named interfaces to unnamed.
- Unsure if this affects ABI.
- ABI mostly just care about C and mpif.h
- Fortran library has different .so versioning.
- Blocker for next v5.0.0rc - get library versioning correct.
- When we talk about ABI - Fortran will be nuanced.
- We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
- Made a minor change for another rc. Trying to get rc built.
- RC from last week, got pushed to this week.
- Still waiting on https://github.com/open-mpi/ompi/issues/11354
- may be enable dso option?
- Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
- Reason it does this, is because CUDA now uses delayed startup so will still be enabled.
- Another variable if CUDA was initialized.
- Should also be on
main
(comment saying otherwise
- Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
- Howard said after the call that this isn't a blocker for rc10
- Waiting on PMIx and PRRTE submodule update.
- Ralph pestered us to please merge it. - just merged on
main
. - Merged, will make rc10
- Ralph pestered us to please merge it. - just merged on
- Need documentation for v5.0.0
- Manpages need an audit before release.
- Double check
--prefix
behavior - Not the same behavior as v4.1.x
- Double check
- What is status of HAN?
- Joseph pushed a bunch of data, but not on the call. Go read this.
- Joseph had some more experiments. HAN collective component with shared memory PR, we were pretty good compared to tuned and another
- Comparing HAN with shared Mem component.
- How many ppr? Between 2ppr and 64ppr
- Better numbers, would be good to document this.
- In OSU there's always a barrier before the operation. If Barrier and operation match up well, you get lower latency.
- We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
- Like to include instructions on how to reproduce as well for users.
- document in ECP -
- Our current resolution is to enable it as is, and fix current regressions in future releases.
- What else is needed to enable it by default?
- Just need to flip a switch.
- The module that Joseph has for shared memory for HAN at the moment would need some work to add additional collectives.
- And it relies on xpmem to be available.
- So for now just enable HAN for collectives we have, and later enable for other collectives.
- George would like to re-use what tuned does, without reimplementhing everything, but a shared memory component is a better choice, but with more work.
- If we don't enabled HAN now by default, it's v5.1 (best case) before it's enabled.
- The trade offs lean toward turning it on and fixing whatever problems might be there.
- There is a PR for tuned (increases default segment size), and changes algorithms in tuned for shared memory.
- Need to start moving forward, rather than doing more analysis.