-
Notifications
You must be signed in to change notification settings - Fork 4
WeeklyTelcon_20160906
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Artem Polyakov
- Brad Benton
- Geoffroy Vallee
- George
- Howard
- Josh Hursey
- Nathan Hjelm
- ralph
- Sylvain Jeaugey
- Todd Kordenbrock
- Milestones
- 1.10.4
- Only potential blocker is issue with wrapper compiler.
- mpifort is not libpath-ing rpath lib
- when you do C builds, add rpath to all dependent libs during build.
- static builds on 1.10
- 1.10.4 Released!
- Ralph will "bulk move" still open 1.10.4 PRs to 1.10.5.
- Only potential blocker is issue with wrapper compiler.
-
- 2.0.1 is OUT
- moving oustanding stuff to 2.0.2 or 2.1.0
- Jeff and Howard pulled in some PRs for 2.0.2
- coll_sync - macro had a type-o in it. Works, but was wrong. Fixed.
- Figured out bug with powerpc atomics - there is a fix.
- optionA - re-enabled PGI atomic and apply a patch.
- optionB - or re-write atomics.
- Summary- there are a small number of asm files that are handlined.
- If there are non-inline atomics, and no asm file - fails horribly in configure
- If there are no-inline atomics, but asm is stale, fails at Build time (powerpc).
- JHjelm - is proposing to remove asm files (as all compilers we support support inline atomics).
- We had a check that said "if PGI, then just use asm file"
- We should require PGI version > 10.8 (for inline atomics).
- Nvidia (Sylvain) agreed this was okay.
- Paul filed bug with PGI inline assembly fix.
- Schedule - End of October.
- Issue 2030 - Comm Spawn is still Broken. - timeout in OPAL_PMIX_Exchange macro. Fixed in master?
- Very hard to reproduce.
- Race condition that's tickled by MTT, but not manually. Have seen this for years.
-
Issue 2049 -
- Patcher issue. Can't write to page (in shared code, read only page).
- disabling patcher framework fixes this.
- No Open BSD drivers, since Open BSD puts program shared pages in read-only, Linux does not.
- Resolved to NOT support this on Open BSD at this time.
-
Issue 2028 - SPML Yoda + MTL doesn't support
- Work not done for Open SHMEM.
- Still allocate a fragment
- OpenSHMEM - works with Open1, and whatever MxM flavors. ???
- Open question, who's going to fix this.
- Blocking issue for 2.1.0.
- Artem - Mellanox is now testing yoda in their jenkins.
- Suggest we remove the broken test from Mellanox jenkins.
- Artem will fix now.
- rework way callbacks are done, and for put and get, don't allocate a fragment.
- Hjelm - can help by telling how BTL3 works.
-
Assuming we'll ship soon, go refactor your PRs from 2.1.0
- Will start merging 2.0.2 PRs in, and then close ompi-release, and then merge the two repos in ompi repo.
-
Timeline for 2.1.0 is very short, because we wanted a small number of fairly low to medium risk that we can get done by end of October. Probably looking at freeze end of September. Shooting for mid-October.
-
Don't yet have a plan for 2.0.2
- Going to make a new fork? What do we call that new fork? is it 2.2 or 3.0? Depends on backwards compatibility story.
-
coll_sync - slated for 2.0.2, classified as bugfix, but don't dump in at last second before 2.1.0
-
Mellanox needs PMIx 2.0 in 2.1.0
- PMIx will release a 2.0 that just has shared memory data as an addition,
- but doesn't have everything else they were targeting for 2.0.0.
- This should come out Early September.
- This is the piece that Mellanox and IBM are interested in.
- Put items requested on the wiki (e.g., PMIx direct modex, OpenSHMEM, stability improvements)
- What do people want to see for 2.1.0?
- Finalize the list in Dallas meeting
- Hopefully target Sept./Oct. release, not Super Computing Goal.
- PMIx will release a 2.0 that just has shared memory data as an addition,
Review Master MTT testing (https://mtt.open-mpi.org/)
- Master has a sea of red, due to OSHMEM issues.
- mpifort failing to link on 1.10 with static as well.
- MPI_Comm_spawn failures at mellanox and maybe ibm. Failing on master a week ago, and now failing on v2.x
- Was working a week or two ago.
- Howard or Ralph will look at when they get some cycles.
- Sylvain might look at some PMIx commits also on v2.x and see if he can isolate.
- Ralph made a lot of progress there. Still need to get submission thing working.
- One
- Josh started moving MTT server to Amazon cloud server.
- No progress last week. Just need to test, and work with Jeff on DNS, and schedule a day to do the move.
-
Next steps for migration?
-
Jenkins and MTT is all that's left.
-
Got download numbers to Edger, some interesting data he'll share (devel list?)
-
Non-profit stuff.
- Cisco is okay with.
- Quarterly opportunity to apply is coming up. We fill out a proposal, and they will accept or reject.
- We're on their agenda (end of september).
- Should get non-profit prices for github dues (Possibly reduced or $0) Unfortunately bill is coming up soon, so Jeff will ask if we can just pay for a month or two, instead of full year.
-
Contribution agreement. Now that we're on github, we're getting more and more anonymous contributions.
- Some folks (who haven't yet signed contributor's agreement) have some IPv6 fixes, and kind of new feature.
- First patch is more bugfix, and restores IPv6 functionality.
- Second patch is more of a non-local feature. Bigger, more technical discussion needed.
- These are critical for Mellanox (in master). Need to be able to run on IPv6 only systems.
- If it's a one-off, just remind them to check with company and make sure it's okay, then do git signoff to make sure you understand it.
- Just put this into the contributors document. Modify this document to explain the process.
- Some folks (who haven't yet signed contributor's agreement) have some IPv6 fixes, and kind of new feature.
-
Other Open Source communities have a big list of things that contributors agree to when they git signoff on a commit. We could do something like that.
- The Agreement also protects the company that contributes.
- Changing the rules on contributors members.
- 2nd issue is that if it's a "big" change that we'd normally require a contributor agreement, members need to have their legal teams review the change in contributor agreements.
- Once Jeff writes up alternate language for contributor's agreement, then all members should get them reviewed by their legal departments.
-
C89 -
- By removing the C99 check, he's defaulting back to GNU89, which isn't even a superset of C89.
- Giles approach is a bit better, but not a good idea.
- when you have a bad GCC, can fix glibc version BLAH, these are the functions that failed to link.
- Patches are incomplete, because glibc on system was built with GCC without C89 compiler. It's not C89, it's GNU 89.
- inlining is different.
- If you can't use GNU 89, can add an attribute to functions to make things compile.
- Consensus to drop this, if submitter wants to answer questions asked on list, we'll consider it.
-
Date of another face to face. January or February? Think about, and discuss next week.
-
Non-Profit
- Ralph sent email out to list, please comment either pro/con.
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel