RuntimeDiscussion_20180718

Open MPI Weekly Telecon

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

Geoff Paulsen
Josh Hursey
Ralph Castain
Geoffroy Vallee
Todd Kordenbrock
Shinji Sumimoto
Takahiro Kawashima
Maurali (LLNL)

Overall Runtime Discussion (talking v5.0 timeframe, 2019)

Two Options:
1. Keep going on our current path, and taking updates to ORTE, etc.
  - Two problems:
    1. Opal abstraction layer. Because every time you want to expose a new PMIx function, you have to do it 3 times.
      - PMIX, 2 OPAL abstraction layer, and 3 in ORTE itself.
      - Problem because extra redundant work, and also problem in terms of BUGs.
      - Potential solution: Could re-do the OPAL abstraction layer. - use PMIx as the internal layer in OMPI itself.
        
        Would have to figure out how to write a SLURM PMI1 or PMI2 interface.
        
        Could call PMIX API and convert to PMI1 or PMI2 protocol for SLURM or ALPS.
        
        Eventually this will go away as SLURM and ALPS will implement PMIX APIS, and wont need PMI1 or PMI2 layers.
        
        Could say with Open MPI v5.0 that we'll only Supply a PMIx API, and those who need it can stay at OMPIv4.0
        
        Need to see how hard of a line we might take.
        
        SLURM already has a PMIx impelementation, but OLDER SLURMS will be the issue.
        
        At the moment, CRAY doesn't yet have a PMIX version of ALPS.
      - Tools - PMI1 and PMI2 don't have tools interfaces.
    2. MPIR - if Open MPI chooses not to REMOVE in v5.0
      - Orthoginal to OPAL abstraction layer issue.
      - Touches ORTE and OMPI layers. - partially broken right now.
      - Historiclly don't worry, and someone will fix bugs.
2. Shuffle our code a bit (new ompi_rte framework merged with orte_pmix frame work moved down and renamed)
  - Opal used to be single process abstraction, but not as true anymore.
  - API of foo, looks pretty much like PMIx API.
    - Still have PMIx v2.0, PMI2 or other components (all retooled for new framework to use PMIx)
  - to call just call opal_foo.spawn(), etc then you get whatever component is underneath.
  - what about mpirun? Well, PRTE comes in, it's the server side of the PMIx stuff.
    - Could use their prun and wrap in a new mpirun wrapper
  - PRTE doesn't just replace ORTE. PRTE and OMPI layer don't really interact with each other, they both call the same OPAL layer (which contains PMIx, and other OPAL stuff).
    - prun has a lam-boot looking approach.
  - Build system about opal, etc. Code Shufflling, retooling of components.
  - We want to leverage the work the PMIx community is doing correctly.
  - ORNL OSHMEM - Having similar discussion, so This approach should work for OSHMEM as well.
  - ORTED - go through opal abstraction as well.
3. PRTE - Third approach looks like lam-boot. - simply move from being inside OMPI to being inside of PRTE.
  - Only way this makes sense if there is a more active community working on PRTE.
  - Any hope on this becoming true? - Not really, we'd be surprised.
  - Thought that when resource managers adopted
  - OSHMEM community needs to have a solution. Right now extract ORTE from Open MPI.
    - OSHMEM is interested in having it's own prted for it's launching.
    - Thought some resources were becoming available, but a bit confusing now.

A slightly different question - seperating runtime project from Open MPI, either PRTE or ORTE. * One benifit of using a seperate runtime project, is that it's easier to integrate. * Like the idea of pulling runtime away from Open MPI as a seperate project. * Then runtime itself can follow it's own path and it's own release cycle. * Then Open MPI can pick a version of runtime based on quality requirements. * Having this seperate project be prte has some advantages

* Fujitsu - process manager - currently implemented and debugging PMIx in their resource manager.

Does Open MPI want a launcher at all?
- It used to be like this with lamboot. Users would boot something, and then
1. In this path, Would say that Open MPI doesn't have a resource manager (might package PRTE).
2. Other path is we ARE going to have a runtime, but who's going to have it.
- Right now, because the runtime is integrated in Open MPI, everyone has to work within this context.
  - If we split the two completely,
- ORTE had to adjust for direct launch for SLURM and other direct launchers.
Three big questions:

Should OMPI and OPAL move to using PMIX directly (without opal abstraction layer)
- Internal code reordering, if done correctly, it'd be transparent.
  - Actually rather simple. Opal modex send/recv macros. Litterally copy those from prte, and put into a header in OMPI or OPAL.
  - Already done in PRTE.
- At some point PMI1 and PMI2 conversion components - some users might see this pain.
- Any reason NOT to do this??? - PMI1 and PMI2 components don't have owners for.
  - Can define this work.
Do we have Open MPI contain ORTE as today, or pull it out into a seperate product (seperate release cycles, etc)
- How to make progress on this question???
- What do we gain by doing this?
  - Those who don't need runtime life is easier.
  - Those who don't need MPI is easier.
  - Customers can update runtime independently from Open MPI releases. (been helpful for other launchers)
  - Could have it's own quality requirements for release.
  - Would like to have seperate runtime tests.
- This is the main decision.
- How do we get the stake holders to the meeting???
  - Lets have another meeting like this in a month?
- How can we get a credible answer to "What's the path forward?"
- Nobody has any resources to put on it. No matter what we decide no one can do it.
- Need a clear decision from the community.
- Do we need statements of intent?
  1. Take ORTE out, and need a 3rd party launcher in some env.
  2. Leave ORTE in, and people have to step up and
    - Do we have everyone call PMIx directly? Burden on non PMIx envs.
If we Do seperate it out, what (if any) do we make default?
- Delay until we can answser #2.

We've got 3 big questions, how do we make progress?

Chicken and Egg problem, people don't see the priority yet, because they don't feel the pain yet.

One solution is to "expose the pain" in small increments.
ECP - exa-scale project for Labs.
If we do this, we still need people to do runtime work over in PRTE.
- In some ways it might be harder to get resources from management for yet another project.
- Nice to have a componentized interface, without moving runtime to a 3rd party project.
- Need to think about it.
Concerns with working adding ORTE PMIx integration.
Want to know the state of SLURM PMIx Plugin with PMIx v3.x
- It should build, and work with v3. They only implemented about 5 interfaces, and they haven't changed.
A few related to OMPIx project, talking about how much to contribute to this effort.
- How to factor in requirements of OSHMEM (who use our runtimes), and already doing things to adapt.
- Would be nice to support both groups with a straight forward component to handle both of these.
Thinking about how much effort this will be. and manage these tasks in a timely manor.
Testing, will need to discuss how to best test all of this.

Today (Geoffroy Vallee)

Lets take a stance and let community react?
- Move the runtime outside of the Open MPI tree, into it's own project.
- Runtime would have it's own release schedule, and meetings.
- Could drop an initial release right away.
- Switch our code to use PMIx directly and not opal abstraction layer.
- If people still want a way to start jobs, they either download a 3rd party package, or as a community we provide a packaged version of the software that gives everything at once.
  - Could be packaged as 2 rpms (one with RTE, and one without RTE)
- Push this out there as what we're thinking about direction we want to go, let community respond with concerns.
  - Could even call the runtime ORTE when we move it out. If we use langage carefully.
- Need to discuss with packagers after community has come to consensus.
Geoffroy Vallee will send out this writeup to devel-core by Same time next week. Follow up meeting 2 weeks from now same time.

RuntimeDiscussion_20180718

Open MPI Weekly Telecon

Attendees

Overall Runtime Discussion (talking v5.0 timeframe, 2019)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!