Skip to content

WeeklyTelcon_20160329

Geoff Paulsen edited this page Apr 5, 2016 · 13 revisions

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Oops. I forgot to capture this week. :(

Agenda

Review 1.10

Review 2.0.x

  • Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
  • Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
    • Issue 1406
      • PR 1491 Resolved and merged to master.
      • Issue 1505 - for v2.x merge (after sits on master a bit)
      • TCP BTL THREAD_MULTIPLE deadlock
        • Resolution: Bugfixing goodness.
      • New non-default feature: TCP async progress only in active if requested via requested via MCA param.
    • Issue 1495 - Nathan gives status.
      • Originally got some code from IBM to patch binaries. x86, x86_64, ia64, ppc64.
        • Got some UCX error when running with UCX.
        • If we always use the Linux patcher on Linux, it should play nicely with UCX. It replaces the function pointer in symbol table. This should always chain all of the hooks. This is what UCX does.
        • Before the patcher code, it's safe to call opal_finalize.
        • Asked Yoci (who did UCX hooks), about providence of UCX hook code. George says, got that code from IBM.
        • Need to wait for Yoci to come back.
      • Even though UCX is a PML, there are still paths where BTL might be in use (one sided for example).
      • Howard liks this solution that plays nicely with other approaches.
        • derived datatypes, too complicated to write OSC for libfabric, so easier just to write a BTL for OMPI.
    • -host PR 1353 move to 2.1?
      • Ralph did some work here. But spreadsheet says that if users don't specify -np, it errors out as compromise.
        • In case where users says -host foo:2 (they've explicitly mentioned number of slots).
        • -host foo,foo is the SAME as foo:2 (2 slots on foo).
      • Easy change for Ralph to make today. Against Master and try to get into v2.0.0
    • Failures in MTT - prob because USOCK component came in, and missing some commit, and causing failures on Master.
  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
  • OMPI Release Open Pull Requests: https://github.com/open-mpi/ompi-release/pulls

Review Master?

  • Master tests are failing.
  • Josh question on PR1482.
    • legitimate concerns on mechanism.
    • Users confused why self

MTT status:

Status Updates:

  • special Mellanox - Mike and Josh have been promoted. Mike no longer Mellanox Jenkin's contact.
    • Jeff will sent out new Mellanox ID.
    • We will see more of Tommy (Austin) Work on collectives, Open MPI, MPICH, etc.
  • LANL - Howard is testing out Ralph's USOCK stuff.
    • Nathan working on patcher stuff, bugfixes, cleanup, and some 1sided design that's too early.
    • Nathan should have patcher code up for PR today.
    • Nathan will be testing with Jenkins
  • Houston -
  • IBM - Working to get Cluster available for community jenkins testing.
    • Working on symbol patcher.
    • Working on licensing framework.
    • Working on building with external pmix, hwloc, and libevent.
    • Integrating our PAMI transport PML.
    • scoping hostlist syntax work for April - PMIx also interested in.
      • Ralph curious of mapping representation.

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally