WeeklyTelcon_20230214

Open MPI Weekly Telecon ---

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Attendance not captured.

Not here today, but keep here for easy cut-n-paste for future.

Geoffrey Paulsen (IBM)
Jeff Squyres (Cisco)
Austen Lauria (IBM)
Brendan Cunningham (Cornelis Networks)
Brian Barrett (Amazon)
Edgar Gabriel (AMD)
Josh Fisher (Cornelis Networks)
Josh Hursey (IBM)
Luke Robison (Amazon)
Matthew Dosanjh (Sandia)
Thomas Naughton (ORNL)
Todd Kordenbrock (Sandia)
Tomislav Janjusic (nVidia)
William Zhang (AWS)
Howard Pritchard (LANL)
Joseph Schuchart (UTK)
David Bernholdt

New Items

Mellanox CI is failing. May be similar to a configure Edgar is seeing an issue where PRRTE is trying to build external, but there's none installed on the machine.
- Sometimes this happens if there's one in the prefix for OMPI.
- Edgar will debug a bit on his side, and Tommy will
New - 32bit (64bit came out 20 years ago)
- Debbian noticed that Open MPI fails to build on 32bit build in configure.
  - This breaks a bunch of other packages that can't be built.
- But are there real users? Or just inertia?
  - Looks like Inertia, but for example Boost library could just turn off MPI support for 32bit builds.
- They're sticking with Open MPI v4.1.x for immediate need.
- Lets check back in a week on estimate for 32bit scoping.
  - We do have 32bit testing that's turned off. So if we decide to test it's easy to reenable.
Issue #11347 Versioning is wrong in v5.0.x
- We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
  - Compile an MPI Application with v4.0.x, then RM -Rf OMPI, and then install the v5.0.0 into the same location, and it just work.
  - Did we figure out the Fortran ABI break?
    - Memory: Yes we did break Fortran ABI.
    - Broke ABI in a very narrow case, when you compile Fortran with 8byte ints, and C 4byte int.
    - Two other things that may or maynot break ABI.
    - Did some stuff with intents and asyncs, and went from named interfaces to unnamed.
      - Unsure if this affects ABI.
  - ABI mostly just care about C and mpif.h
  - Fortran library has different .so versioning.
- Blocker for next v5.0.0rc - get library versioning correct.
- When we talk about ABI - Fortran will be nuanced.
Comm Size Issue. Issue #11373.
- Edgar created a PR to fix Comm Size to be same as v4.1.x to maintain backward compatibility for v5.0.0 from v4.1.x built apps.
Austen said he'd try to find time to run the
- Some GNU ABI checker tool might help.

v4.1.x

Need to pull in a PMIx v3.1.
Fix cuda issue, due to a bad cherry-pick from earlier.
- Reworking a PR, in progress.
Made a minor change for another rc. Trying to get rc built.

v5.0.x

Romio issue not
RC from last week, got pushed to this week.
- Still waiting on https://github.com/open-mpi/ompi/issues/11354
- may be enable dso option?
  - Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
    - Reason it does this, is because CUDA now uses delayed startup so will still be enabled.
    - Another variable if CUDA was initialized.
  - Should also be on main (comment saying otherwise
- Howard said after the call that this isn't a blocker for rc10
Howard has had an issue using external compilers with the accelerator
- Issue #11354
Cuda Framework #11354 - Howard is working on it.
- SM-Cuda if you disable building it, the problem doesn't occur.
- --enable-so don't see this.
- Dont see if app initializes cuda before MPI_Init (maybe)
- Takes a number of factors to see this.
- If application is actually using CUDA - then everything works.
- Problem is that app doesn't use CUDA, but sm-cuda then initializes (even though application doesn't need cuda)
  - Calls into framework, to
  - At Finalize makes calls into the accelerator, it gets cuda runtime errors.
- We think want SM-CUDA if running on a single node.
- Was it just the IPC or also something else? Believe it was IPC stuff.
  - No IPC support to Accelerator framework. Just direct dependency on cuda.
- For collective cuda - never directly uses Cuda buffers, just checks and then memcopies into host.
  - All of this does use accelerator framework.
  - These three components added a direct CUDA dependency because they call CUDA directly, instead of calling through framework.
    - BTL-sM-cuda
    - Rcache-somethign-sm
    - Rcache-gpu-sm
ROMIO isn't included in packaging properly.
- Issue #11364 Austen is taking a look. Might have missed something.
Waiting on PMIx and PRRTE submodule update.
- Ralph pestered us to please merge it. - just merged on main.
- Merged, will make rc10
Need documentation for v5.0.0
Manpages need an audit before release.
- Double check --prefix behavior
- Not the same behavior as v4.1.x
What is status of HAN?
- Priority bump of HAN PR #11362 to main, need one to v5.0.x
- Joseph pushed a bunch of data, but not on the call. Go read this.
- Joseph had some more experiments. HAN collective component with shared memory PR, we were pretty good compared to tuned and another
  - Comparing HAN with shared Mem component.
  - How many ppr? Between 2ppr and 64ppr
- Better numbers, would be good to document this.
  - In OSU there's always a barrier before the operation. If Barrier and operation match up well, you get lower latency.
  - We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
    - Like to include instructions on how to reproduce as well for users.
    - document in ECP -
  - Our current resolution is to enable it as is, and fix current regressions in future releases.
  - What else is needed to enable it by default?
    - Just need to flip a switch.
    - The module that Joseph has for shared memory for HAN at the moment would need some work to add additional collectives.
    - And it relies on xpmem to be available.
    - So for now just enable HAN for collectives we have, and later enable for other collectives.
    - George would like to re-use what tuned does, without reimplementing everything, but a shared memory component is a better choice, but with more work.
    - If we don't enabled HAN now by default, it's v5.1 (best case) before it's enabled.
      - The trade offs lean toward turning it on and fixing whatever problems might be there.
    - There is a PR for tuned (increases default segment size), and changes algorithms in tuned for shared memory.
    - Need to start moving forward, rather than doing more analysis.

WeeklyTelcon_20230214

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

Not here today, but keep here for easy cut-n-paste for future.

New Items

v4.1.x

v5.0.x

Main branch

Administration Topics

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!