Skip to content

WeeklyTelcon_20160719

Geoff Paulsen edited this page Jul 19, 2016 · 9 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Artem Polyakov
  • Brian
  • Edgar Gabriel
  • Howard
  • Josh Hursey
  • Nathan Hjelm
  • Ralph
  • Ryan Grant
  • Todd Kordenbrock

Agenda

Review 1.10

  • Milestones
  • A couple of things sitting against 1.10.4

Review 2.0.x

  • Wiki

  • 2.0.1 PRs that are reviewed and approved

    • v2.0.1 PRs are open. Need to get PRs reviewed!
  • Blocker Issues *

  • Milestones

  • We release last Tuesday. Now taking in PRs.

  • A lot of 2.0.1 PRs that did not get reviewed yet, so please get reviews.

  • Howard and Jeff merging in low risk ones.

  • nvidia failures with OFED install (false failures)

  • cisco failures - still some failures here. Have to do with sparse groups. One of the PRs we haven't pulled in yet.

  • IBM seem to do with spawn an intercomm interconnect

    • Might call Connect / Accept - when we create key, we use PMIx to communicate between leaders.
    • PMIX needs to support Exchange.
    • Aborts but Hangs. - PMIx error code is coming up.
  • Cray - all associated with SPAWN, but CRAY PMI doesn't support it.

  • Applaunch with Master doesn't work

  • MPI_Info keys are weird. OMPI_NUM_APPS? what is that?

    • SLURM direct launch wont work with Yalla (due to PMIx-Fence issue (it was changed to nonblocking)).
      • It's okay, since it will be fixed in 2.0.1 soon after.

Review Master MTT testing (https://mtt.open-mpi.org/)

MTT Dev status:

New Items:

  • Gentle reminder that lots of 2.0.1 PRs haven't been reviewed yet.

  • Merging github master and ompi_release is taking a backseat to migraion.

  • Migration ongoing, nothing's moved yet, just testing:

    • Mailman lists - sanity check of list of lists that we are migrating, and not-migrating.
      • if Community is good with list of lists, then give everyone a heads up that it's moving.
      • new aliases will be @lists.openmpi-org.
    • Transfer MTT to Ralph's machine to address PostGRES issue, before transitioning.
      • MTT code is somewhat POSTGRES specific. But Hostgator support MYSQL, but not POSTGRES.
      • So need to modify code from POSTGRES to MYSQL.
      • So Intel is temporarily migrating MTT Sever until we can migrate to MYSQL.
      • meeting with MTT to guestimate the time... few months of realistic effort.
      • Mostly API issue, though some POSTGRES specific tables. That will need to change. Database structure won't have to change.
    • moving main website. Mostly a solved issue. Want to do mailing list stuff first.
    • PDFs for 3rd party agreements. Ralph talked to Hostgator, they have a file sharing that increases price dramatically.
      • If only one or two people have access, and have permission on HostGator, perhaps this is acceptable.

New discussion.

  • Mellanox Jenkin's - Some jenkin's testing that was failling in MPI_Init, not sure if new MELLANOX Seed.
    • Will look into. Server was rebooted, they are doing some maintenance. Perhaps this is causing issues.
    • Jeff tagged Artem in PR in last few hours.
  • Possible to put a :bot-mellanox-retest: on Mellanox Jenkins
    • Artem will try.
  • Howard pointed out yesterday. Jeff did a bot-retest of old 2.0.1 PRs, because he thought they'd be done in serial. But Mellanox config says it will run 10 in parallel
  • Artem - discuss benchmarks.

Status Updates:

  1. Mellanox
    • Artem sent out message rate email.
  2. Sandia
  3. Intel

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM
  3. Cisco, ORNL, UTK, NVIDIA

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally