Skip to content

WeeklyTelcon_20160906

Geoff Paulsen edited this page Sep 6, 2016 · 14 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Artem Polyakov
  • Brad Benton
  • Geoffroy Vallee
  • George
  • Howard
  • Josh Hursey
  • Nathan Hjelm
  • ralph
  • Sylvain Jeaugey
  • Todd Kordenbrock

Agenda

Review 1.10

  • Milestones
  • 1.10.4
    • Only potential blocker is issue with wrapper compiler.
      • mpifort is not libpath-ing rpath lib
      • when you do C builds, add rpath to all dependent libs during build.
      • static builds on 1.10
    • 1.10.4 Released!
    • Ralph will "bulk move" still open 1.10.4 PRs to 1.10.5.

Review 2.0.x

  • Wiki

  • Milestones

    • 2.0.1 is OUT
    • moving oustanding stuff to 2.0.2 or 2.1.0
    • Jeff and Howard pulled in some PRs for 2.0.2
    • coll_sync - macro had a type-o in it. Works, but was wrong. Fixed.
    • Figured out bug with powerpc atomics - there is a fix.
      • optionA - re-enabled PGI atomic and apply a patch.
      • optionB - or re-write atomics.
      • Summary- there are a small number of asm files that are handlined.
        • If there are non-inline atomics, and no asm file - fails horribly in configure
        • If there are no-inline atomics, but asm is stale, fails at Build time (powerpc).
      • JHjelm - is proposing to remove asm files (as all compilers we support support inline atomics).
        • We had a check that said "if PGI, then just use asm file"
        • We should require PGI version > 10.8 (for inline atomics).
        • Nvidia (Sylvain) agreed this was okay.
        • Paul filed bug with PGI inline assembly fix.
    • Schedule - End of October.
    • Issue 2030 - Comm Spawn is still Broken. - timeout in OPAL_PMIX_Exchange macro. Fixed in master?
      • Very hard to reproduce.
      • Race condition that's tickled by MTT, but not manually. Have seen this for years.
    • Issue 2049 -
      • Patcher issue. Can't write to page (in shared code, read only page).
      • disabling patcher framework fixes this.
      • No Open BSD drivers, since Open BSD puts program shared pages in read-only, Linux does not.
      • Resolved to NOT support this on Open BSD at this time.
    • Issue 2028 - SPML Yoda + MTL doesn't support
      • Work not done for Open SHMEM.
      • Still allocate a fragment
      • OpenSHMEM - works with Open1, and whatever MxM flavors. ???
      • Open question, who's going to fix this.
      • Blocking issue for 2.1.0.
      • Artem - Mellanox is now testing yoda in their jenkins.
      • Suggest we remove the broken test from Mellanox jenkins.
        • Artem will fix now.
      • rework way callbacks are done, and for put and get, don't allocate a fragment.
        • Hjelm - can help by telling how BTL3 works.
  • Assuming we'll ship soon, go refactor your PRs from 2.1.0

    • Will start merging 2.0.2 PRs in, and then close ompi-release, and then merge the two repos in ompi repo.
  • Timeline for 2.1.0 is very short, because we wanted a small number of fairly low to medium risk that we can get done by end of October. Probably looking at freeze end of September. Shooting for mid-October.

  • Don't yet have a plan for 2.0.2

    • Going to make a new fork? What do we call that new fork? is it 2.2 or 3.0? Depends on backwards compatibility story.
  • coll_sync - slated for 2.0.2, classified as bugfix, but don't dump in at last second before 2.1.0

  • Mellanox needs PMIx 2.0 in 2.1.0

    • PMIx will release a 2.0 that just has shared memory data as an addition,
      • but doesn't have everything else they were targeting for 2.0.0.
      • This should come out Early September.
      • This is the piece that Mellanox and IBM are interested in.
    • Put items requested on the wiki (e.g., PMIx direct modex, OpenSHMEM, stability improvements)
    • What do people want to see for 2.1.0?
    • Finalize the list in Dallas meeting
    • Hopefully target Sept./Oct. release, not Super Computing Goal.

New Agenda Items:

Review Master MTT testing (https://mtt.open-mpi.org/)

  • Master has a sea of red, due to OSHMEM issues.
  • mpifort failing to link on 1.10 with static as well.
  • MPI_Comm_spawn failures at mellanox and maybe ibm. Failing on master a week ago, and now failing on v2.x
    • Was working a week or two ago.
    • Howard or Ralph will look at when they get some cycles.
    • Sylvain might look at some PMIx commits also on v2.x and see if he can isolate.

MTT Dev status:

  • Ralph made a lot of progress there. Still need to get submission thing working.
    • One
  • Josh started moving MTT server to Amazon cloud server.
    • No progress last week. Just need to test, and work with Jeff on DNS, and schedule a day to do the move.

Website migration

  • Next steps for migration?

  • Jenkins and MTT is all that's left.

  • Got download numbers to Edger, some interesting data he'll share (devel list?)

  • Non-profit stuff.

    • Cisco is okay with.
    • Quarterly opportunity to apply is coming up. We fill out a proposal, and they will accept or reject.
      • We're on their agenda (end of september).
    • Should get non-profit prices for github dues (Possibly reduced or $0) Unfortunately bill is coming up soon, so Jeff will ask if we can just pay for a month or two, instead of full year.
  • Contribution agreement. Now that we're on github, we're getting more and more anonymous contributions.

    • Some folks (who haven't yet signed contributor's agreement) have some IPv6 fixes, and kind of new feature.
      • First patch is more bugfix, and restores IPv6 functionality.
      • Second patch is more of a non-local feature. Bigger, more technical discussion needed.
    • These are critical for Mellanox (in master). Need to be able to run on IPv6 only systems.
    • If it's a one-off, just remind them to check with company and make sure it's okay, then do git signoff to make sure you understand it.
    • Just put this into the contributors document. Modify this document to explain the process.
  • Other Open Source communities have a big list of things that contributors agree to when they git signoff on a commit. We could do something like that.

    • The Agreement also protects the company that contributes.
    • Changing the rules on contributors members.
    • 2nd issue is that if it's a "big" change that we'd normally require a contributor agreement, members need to have their legal teams review the change in contributor agreements.
    • Once Jeff writes up alternate language for contributor's agreement, then all members should get them reviewed by their legal departments.
  • C89 -

    • By removing the C99 check, he's defaulting back to GNU89, which isn't even a superset of C89.
    • Giles approach is a bit better, but not a good idea.
    • when you have a bad GCC, can fix glibc version BLAH, these are the functions that failed to link.
    • Patches are incomplete, because glibc on system was built with GCC without C89 compiler. It's not C89, it's GNU 89.
      • inlining is different.
      • If you can't use GNU 89, can add an attribute to functions to make things compile.
    • Consensus to drop this, if submitter wants to answer questions asked on list, we'll consider it.

Open MPI Developer's Meeting

  • Date of another face to face. January or February? Think about, and discuss next week.

  • Non-Profit

    • Ralph sent email out to list, please comment either pro/con.

Status Update Rotation

  1. LANL, Houston, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally