-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20200505
Geoffrey Paulsen edited this page May 9, 2020
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Austen Lauria (IBM)
- Barrett, Brian (AWS)
- Brendan Cunningham (Intel)
- Geoffrey Paulsen (IBM)
- George Bosilca (UTK)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Josh Hursey (IBM)
- Joshua Ladd (nVidia/Mellanox)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Intel)
- Naughton III, Thomas (ORNL)
- Ralph Castain (Intel)
- Todd Kordenbrock (Sandia)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia/Mellanox)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- David Bernhold (ORNL)
- Edgar Gabriel (UH)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia/Mellanox)
- mohan (AWS)
Back to 2020 WeeklyTelcon-2020
- MTT -
- If you change your MTT to startup PRRTE at begining of session, and just use prun.
- Can see times cut in half or more.
- This is good, but also need to test mpirun wrapper.
- Cisco is converting half of MPI installs to use prrte/prun
- OMPI master submodule pointers setup to track PMIx and PRRTE master.
Blockers All Open Blockers
Review v4.0.x Milestones v4.0.4
- v4.0.4 Break
- v4.0.4rc1 - This week
- 7616 - ABI break introduced in OMPI v4.0.3 for some f08 symbols.
- Schedule:
- Feature Freeze: May 14 (slipped from April 30)
- Please Post a PR ASAP as place holder
- Release: End of June
- Feature Freeze: May 14 (slipped from April 30)
- PMIx v4.0.0 - on track
- Schedule: Still a number of issues, but probably not blocking
- PRRTE v2.0 -
- Went through issues to discuss remaining issues.
- MCA usage is very different in PRRTE than in ORTE.
- ORTE was a "one-shot" launcher, but PRRTE is persistant.
- When launching PRRTE you can set "defaults" for the deamon
- individual pruns override these defaults via command line parameters not mca parameters.
- Now have two MCA users in the job. OPAL / PRRTE - if setting something in the wrong one, then it gets ignored and is confusing.
- There will be a lot of mca param files, won't do what people expect them to do.
- Might want to Open some issues on OMPI side to do some docs.
- report bindings doesn't make sense to set this as a "default setting" in PRRTE, so is always a per-job basis.
- RC1 Blockers things to get done before RC1 (Maybe 2-3 weeks?)
- Need to get User-facing stuff done to reduce use confusion.
- Binding reporting should be done (confusing) 523 - Needs thinking/careful work.
- A bunch of knarly issues in here:
- Call tomorrow?
- Socket -> Package name change - Should we do now or later?
- Already a lot of change, but hwloc has already moved onto the new name.
- Also what to do with NUMA? - Doesn't even make sense anymore on some archetecture.
- Depends on what versions of hwloc we support. Will be tricky (or more expensive) to support hwloc both 1 and 2.
- Is there a list of distros and hwloc versions? Brian will put together list.
- Discussing Features on google sheets document
- (https://docs.google.com/spreadsheets/d/1OXxoxT9P_YLtepHg6vsW3-vp4pdzGQgyknNbkzenYvw/edit#gid=0) which were taken from the face to face wiki.
- Please post PRs ASAP even if not quite ready yet.
- Please send collective tuning data to AWS to help select new defaults.
- Today with libevent, we default to prefering libevent if it's version is newer (we redistribute 2.0.22)
- Still a bunch of distros that ship a libevent not newer than 2.0.22, but works.
- For v5.0 we're continuing down the path of NOT encouraging users to use the internal libs.
- So probably should just use external if it's found, as long as it's newer than 2.0.21 (trusted version)
- Issues not tracked on spreadsheet.
- libopal isn't slurped into Open-MPI correctly (related to 7560)
- Jeff and Brian will meet Friday
- libopal isn't slurped into Open-MPI correctly (related to 7560)
-
Heriarchacal collectives
- If someone wants to do, PMIx has much of this information already.
- Not too hard to do, and they're much faster. Will be in next version of competitor MPI
- Probably not for v5.0
-
SLURM PMIx plugin has been locked on PMIx v2 for some time.
- There are some NEW PMIx calls that SHOULD be added to bring it up.
- Ralph has started a PR, but needs help.
- PR???
- So for now, there's some optional info that won't be passed correctly.
- No OMPI_INFO for now.
- Ralph gets pinged occasionally.
- Not sure priority of this.
- There are some NEW PMIx calls that SHOULD be added to bring it up.
-
MTT on master is looking pretty good.
- Defered.
- scale-testing, PRs have to opt-into it.
Review Master Master Pull Requests
- CI testing only tests build and did it run, but doesn't test HOW it ran.
- Environment setup can be a bit different.
- For example no-permissions in
/tmp
. Might pass on one machine, and fail on another without/tmp
permissions.