-
Notifications
You must be signed in to change notification settings - Fork 868
WeeklyTelcon_20200616
Jeff Squyres edited this page Jun 25, 2020
·
2 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Aurelien Bouteiller (UTK)
- Austen Lauria (IBM)
- Barrett, Brian (AWS)
- Brendan Cunningham (Intel)
- Christoph Niethammer (HL
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Mark Allen (IBM)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Intel)
- Nathan Hjelm (Google)
- Naughton III, Thomas (ORNL)
- Ralph Castain (Intel)
- Todd Kordenbrock (Sandia)
- William Zhang
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia/Mellanox)
- Brandon Yates (Intel)
- Charles Shereda (LLNL)
- David Bernhold (ORNL)
- Erik Zeiske
- Geoffroy Vallee (ARM)
- George Bosilca (UTK)
- Josh Hursey (IBM)
- Joshua Ladd (nVidia/Mellanox)
- Matias Cabral (Intel)
- Noah Evans (Sandia)
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- William Zhang (AWS)
- Xin Zhao (nVidia/Mellanox)
- mohan (AWS)
Blockers All Open Blockers
Review v4.1.x Milestones v4.1.0
- Schedule: Want to release mid-July
- RC1 planned for Monday, 8 July, 2020
- Release Engineers: Brian (AWS) Jeff Squyres (Cisco)
- We've come to consensus for a v4.1.0 release
- Not breaking ABI or backwards compatibility.
- Blocker moving forward is to start from the v4.0.4 tag (Tomorrow)
- NOT touching runtime!!!
- Not going to be pulling in a new PMIx version.
- Next Steps: MTT testing needs to come online
- Ciscos already online last night.
- IBM will get it online tonight.
- AWS will get it online tonight as well.
- Mellanox - will come online tomorrow night
Review v4.0.x Milestones v4.0.4
- v4.0.4 Released last week.
- Where do we stand with the memory hooks for Open MPI Memory patcher?
- Save/restore of r2 on ppc64le only.
- Not sure which component use these memory patcher
- OpenIB uses Open MPI memory patcher. Not in master, only in v4.0.x
- 7799 is still open against master.
- v4.0.5 - No schedule yet.
Review v5.0.0 Milestones v5.0.0
-
Need to put OSC pt2pt
- OS RDMA requires a single BTL that can contact every single process.
- This didn't use to be the case. (Comment in the code)
- OS RDMA requires a single BTL that can contact every single process.
-
We can't use the OSC pt2pt.
- It is not thread safe. Doesn't conform to MPI4 standard. Not safe.
- This is just a testing falicy. Could add tests to show this, but still at same boat.
- Either product A or B is broken and we need to fix it.
-
RDMA Onesided should fall back to "my atomics" because TCP will never have rdma atomics.
- The idea was to put the atomics into the BTL base, which could do all of the one-sided atomics under the covers.
-
Jeff will close the PR, and
-
Jeff will Nathan will fetching, get, compare and swap.
-
Does UCX support iWarp?
- Does libFabric support iWarp via verbs provider?
- https://github.com/openucx/ucx/issues/2507 suggest it doesn't.
- Brian thinks that libFabric
- OFI can support iWarp, just need to specify the provider in the include list.
- This person who's asking is a partner not a customer
-
PMIX
- Working on PMIx v4.0.0 which is what Open MPI v5.0 will use.
- PPN scaling issue - simple algorithmic issue in this function
- PMIX talked about it. Artem might know someone who might be interested in working on it.
- Algorithm behind one of the interfaces doesn't scale well.
- Not a regression. Above ~ 4K nodes, becomes quadratic.
-
PRRTE *
-
Two new PRs for MPI4.0 Error handling - new PRs from Aurelien Bouteiller. *
- UCX is failing in certain test cases, SEGV
- Austen will open an issue.
- PRRTE is hitting and assert in some cases.
- Austen will Open Issue
- Remaining CISCO failures look like connectivity issues.
- Jeff hasn't got to look deeper to see
- Looks like USNIC is either not being picked or disqualifying itself internic.
- CLANG - added float16
- Need to add a special compiler flag for software emulation of float16.
- Not magically add that flag.
- Many companies are not allowing a face to face travel until 2021 due to COVID19.
- Instead lets do a series of virtual-face to face?
- Yes this summer to discuss for v5.0
- Maybe we can do it by topic?
- Maybe not 4 or 8 hour things.
- Different topics on different days.
- Do a doodle poll of least-worse days in late July/August.
- Start a list of topics.
- May not have Super Computing conference at ALL this year.
- Many other projects are doing a virtual state of the union type meeting to try to cover what they'd usually do in a Birds of a feather meeting.
- Then this works pretty well, and do this a couple of times a year.
- Not constrained to Super Computing
- scale-testing, PRs have to opt-into it.