-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20170410
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Artem Polyakov
- Brian Barrett
- David Bernholdt
- Geoffroy Vallee
- Howard
- josh Hursey
- Joshua Ladd
- Ralph
- Thomas Naughton
- Todd Kordenbrock
Review All Open Blockers
Review Milestones v2.1.0
-
https://github.com/open-mpi/ompi/issues/3267 - a v2.1.1 based blocker
- Jeff seems to remember some persistent one sided failure.
- Looks like issue still opened but PRs PULLed in?
- Cisco can turn on MTT for master.
-
https://github.com/open-mpi/ompi/issues/3268
- Artem still sees this, but hasn't seen it since Nathan's merge.
- Segfault when trying to launch under a debugger specific to v2.1.1
- Ralph created a PR with a fix, that should go into a v2.x release.
Review Milestones v3.0
- Load Leveler support was removed, but code remains. IBM approves removal on master.
- v3.0 Support items:
- 64bit
- MacOSX10.12
- FreeBSD
- Cisco MTT is going -m32 builds.
Review Master Pull Requests
Review Master MTT testing
- GIT PR - Why do merge, and not rebase and merge?
- Shows empty (or sometimes non-empty) merge commits.
- Idea that we merge exactly what the CI tested.
- Can be very hard to line up PRs.
- Good to periodically audit what we're doing, and discuss.
- the Merge-commit is not signed off (and gets flagged a bunch in CI).
-
https://github.com/open-mpi/ompi/pull/3288
- Ralph noticed that there was a bunch of OMPI_ env vars that were being propagated, but shouldn't be.
- ALL OMPI_* was being propagated, but we really should be propagating OMPI_MCA_*.
- We do set some OMPI_UNIVERSE_SIZE type env vars.
- Surprised. It was forwarding env vars that it shouldn't have been.
- Document that users should stop doing this.
- We'll continue to discuss next week.
- There are times when you need to capture something prior to calling OPAL_Init, so influencing STDOUT.
- These can't be MCA params, because that won't be open yet.
- Ralph has an issue when using -btl sm.
- Could put an abort when can't find an endpoint. But this in BML R2. Error message coming from there.
- Portion of code in end_procs - abort will give a stack trace, and can figure out there.
- this communication is removing advantage of not-doing full modex. But then doing on-demand modex because they're trying to see who they can talk to.
- Shouldn't be happing, Ralph will look into R2, and try to figure out who's communicating and why.
- Ralph will give a presentation next time. Looks really good, minus a Kernel issue with KNL.
- FYI - You will see lots of Jenkins jobs, that's Brian's adding stuff. jenkins.open-mpi.org - will see lots of builder things. Amazon fiddling with Jenkin's settings.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu