-
Notifications
You must be signed in to change notification settings - Fork 871
WeeklyTelcon_20160927
Geoffrey Paulsen edited this page Jul 25, 2023
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Josh Hursey
- Joshua Ladd
- Brian Benton
- Sylvain Jeaugey
- Artem Polyakov
- Brad Benton
- Todd Kordenbrock
- Nathan
- Milestones
- 1.10.4
- Nothing new. No drivers yet.
- milestones fields have been used in two different ways.
- PR against Master, what should we use milestone field for? which release branch to target?
- can only set 0 or 1 milestones.
- No other real fallout of repo combining
-
- 2.0.2 in preparation
- All just bugfixes, no time table yet.
- just progressing.
- 2.1.0
- Sept 30th deadline for New Features. Jeff sent email last Wed or Thursday.
- Mpool Rcache re-write PR2101. Nathan reverted all of the spot fixes, and then applied all of them.
- Nathan has 2 more PRs for trivial features he'd like to add.
- Add flag enumerator to mca base - one of the cherry-picks will be much harder if it goes in before.
- Been on master for many months, but many many cherry-picks, so PLEASE review.
- Affects every BTL, been couple of different
- Goals: clear interface between mpool and rcache. supports memkind.
- orthogonal to memhooks, because only affects internal allocations.
- confusing, because it used to be very confusing, but NOW is separating this out. All explicit, allocations, and then separate registrations.
- Nathan has 2 more PRs for trivial features he'd like to add.
- Was hoping to get 2.1.0 PRs in, before we merge GIT repos.
- C++ wrappers for OSHMEM - Failed Jenkins but passes by hand. Resolve before merging.
- One sided
- -disable
- PMIx 2.0
- Ralph is out this week and maybe more.
- Need to move ahead, and TRY to make progress.
- Give feedback to Ralph that we should try to keep PRs out there for a couple of days so developers on other side of the world have time to comment.
- Hard deadline of Friday on PMIx PR _____.
- Then will pull into ompi master. Test. then PR it to v2.1.0.
- In Master can use PMIx 2.0 with external. In Master, internal component has already upgraded to 3.0.
- IBM and Mellanox, along with Nathan and Howard (LANL) will meet to discuss getting this work done quickly for OMPI 2.1.0.
- Looked at a prototype of the merged GITHUB repo called ompi-all-the-branches
- Review mechanism is web-only.
- Blocking on OSHMEM - needs rebasing.
- Yoda maintenance.
- Ongoing performance discussion.
- Most PRs marked as RM approved
- Discussion on a few other other items
-
Blocker 2.0.2 issues
-
Issue 2075
- Non-issue since
SIGSEGV
is not forwarded.
- Non-issue since
-
Issue 2049
- Ticket updated
-
Issue 2030
- MTT seems to be the only place to reproduce
- Might be a debug build related issue in usage of
opal_list_remove_item
-
Issue 2028
-
yoda
needs to be updated for BTL 3.0 - 2.1 will not be released until
yoda
is fixed - Propose: Remove
yoda
from 2.1, and move toucx
- Raises the question: Does it make sense to keep OSHMEM in Open MPI if
yoda
is removed?
-
- Issue 1831
-
Issue 2075
- Blocker 2.1.0 issues
- 2.0.2 in preparation
-
OSHMEM - Yoda Maintenance
- Want to progress both MPI and OSHMEM in same process, don't want multiple network stacks.
- Original argument was to use OSHMEM over BTL - to use all network stacks (TCP, SM, OpenIB)
- 4 years ago, but things changed. Don't really have that anymore, have PMLs and SPMLs.
- Last week Mellenox proposed moving to UCX.
- OSHMEM sits on top of MPI layer, since it uses much of it.
- Over last couple of years, it's been decoupled from MPI, now it's sitting on side.
- But now it's sitting off on the side, and no-one is interested in maintaining the connection to OPAL support and ORTE. If that's all it's using, there are other projects that share OPAL and ORTE.
- Only reason to be in repository is because connected at the MPI layer.
- BUT, When you start OSHMEM, first thing called is OMPI_MPI_Init.
- Maybe it would help, exactly what in MPI layer OSHMEM is using?
- OPAL<-ORTE<-OMPI<-OSHMEM dependency chain.
- Maybe it would help to show where that is.
- OSHRUN (really ORTERUN), Calls OMPI_MPI_Init. Build an MCA plugin infrastructure on top of that.
- Can't just slash pieces away.
- Take advantage of all PMIx, Direct Modex, proc structure, and everything that supports this.
- According to this PR on Master - OSHMEM has the same proc structure as OMPI, but actually has some MORE at the end of it.
- What about the Transports? MPI -mxm boils down to libmxm, and so does OSHMEM down to libmxm.
- Became an issue with BTL 3.0 API change.
- A number of things, especially over last year, MPI focus and OSHMEM focus. A number of breaks between MPI / OSHMEM, release schedules conflicts.
- Does it make sense to separate the repositories, or design a way to make it easy to pull between the two projects.
- Right now there is a regression in the code base.
- Mellanox can't replace Yoda with UCX in October.
- Mellanox will fix Yoda for this time (for 2.1.0)
- Could package UCX along side with other transports and let the market decide.
- Want to continue this discussion about OSHMEM importance included with Open MPI project.
- We need to have an important discussion about future of MPI / OSHMEM.
-
Discussion on Giles onexit PR2121
- Seems okay, not sure what the purpose of it.
- Need to ask Giles why he wants this.
-
Discussion on configure variable for renaming libmpi to something else. (intended for IBM or other vendors).
- could so something like --with-ident-string (configure time build / vendor type option)
-
SPI - http://www.spi-inc.org/
- getting people to approve of these.
- We'll be on Oct 12th Agenda. Once they formally invite us, then we have 60 days to agree / decline.
- Works solely on a volunteer basis, so very inexpensive.
- End of September for soliciting feedback on using SPI.
- Open MPI will hold a formal vote after we receive the formal invite (in mid-to-late-December?)
-
New Contribution agreement / Consent agreement / Bylaws.
- Will need a formal vote by members.
- End of October for discussion of new contributor agreement / bylaws.
- After that we'll set a date for voting.
- New Contributor agreement
-
EuroMPI 2016 In Edinburgh - Spet 25-28
- MPI Forum: Sept. 21-23
- People might be traveling next week
Review Master MTT testing (https://mtt.open-mpi.org/)
- Date of another face to face. January or February? Think about, and discuss next week.
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel