Meeting 2016 08

August 2016 Open MPI Developer's Meeting

Logistics:

Start: 9am, Tue Aug 16, 2016
Finish: 1pm, Thu Aug 18, 2016
Location: IBM facility, Dallas, TX
Attendance fee: $50/person, see registration link below

Location:

Exactly the same setup as in February
- IBM Dallas Innovation Center web site
- Google Maps
- Street address: 1177 South Beltline Road, Coppell, Texas 75019 USA
- Enter on the East entrance (closest to Beltline Road)
  - Hollerith Room - On left after you walk in. (Now All 3 days, in the same room!)
  - Receptionist should have nametags for everyone.
  - Foreign Nationals welcome.
  - No need to escort visitors in this area.

Attendees

Please both register at EventBrite ($50/person) and add your name to the wiki list below if you are coming to the meeting:

Jeff Squyres, Cisco
Howard Pritchard, LANL
Geoffrey Paulsen, IBM
Ralph Castain, Intel
George Bosilca, UTK (17 and 18)
Josh Hursey, IBM
Edgar Gabriel, UHouston
Takahiro Kawashima, Fujitsu
Shinji Sumimoto, Fujitsu
Brian Barrett, Amazon Web Services
Nathan Hjelm, LANL
Sameh Sharkawi, IBM (17 and 18)
Mark Allen, IBM
Josh Ladd, Mellanox (17)
...please fill in your name here if you're going to attend...

Topics Still To Discuss

Annual git committer audit
- Google spreadsheet
Plans for v2.1.0 release
- Need community to contribute what they want in v2.1.0
- Want to release by end of 2016 at the latest
Present information about IBM Spectrum MPI, processes, etc.
- May have PR's ready to discuss requested changes, but schedule is tight in July / August for us.
MTT updates / future direction
How to help alleviate "drowning in CI data" syndrome?
- One example: https://github.com/open-mpi/ompi/pull/1801
- One suggestion: should we actively market for testers in the community to help wrangle this stuff?
- If Jenkins detects an error, can we get Jenkins to retry the tests without the PR changes, and then compare the results to see if the PR itself is introducing a new error?
- How do we stabilize Jenkins to alleviate all these false positives?
PMIx roadmap discussions
Thread-safety design
- Need some good multi-threaded performance tests (per Nathan and Artem discussion)
  - Do we need to write them ourselves?
- Review/define the path forward
Fujitsu status
- Memory consumption evaluation
- MTT status
- PMIx status
Revive btl/openib memalign hooks?
Request completion callback and thread safety
Discuss appropriate default settings for openib BTL
- Email thread on performance conflicts between RMA/openib and SM/Vader
Ralph offers to give presentation on "Flash Provisioning of Clusters", if folks are interested
Cleanup of exposed internal symbols (see https://github.com/open-mpi/ompi/pull/1955)
Performance Regression tracking
- What do we want to track, and how are we going to do that.
- https://github.com/open-mpi/ompi/issues/1831#issuecomment-229520276
- https://github.com/open-mpi/mtt/issues/445
Symbol versioning
- Per request from Debian
What to do about MPI_Info PR from IBM / MPI Forum gyrations about MPI_Info?
Should we be using Slack.com as a community?

Topics Already Discussed

NOTE: Some notes are included below. But a much more detailed writeup can be found in the meeting minutes

Status of v2.0.1 release
- Lots of PRs still...
- From the meeting:
  - Closing in on v2.0.1. Most PRs are in. Release next Tuesday (Aug 23, 2016) if possible
After v2.1.0 release, should we merge from master to the v2.x branch?
- Only if there are no backwards compatibility issues (!)
- This would allow us to close the divergence/gap from master to v2.x, but keep life in the v2.x series (which is attractive to some organizations)
- Alternatively, we might want to fork and create a new 3.x branch.
- From the meeting:
  - Long discussion. There seems to be two questions:
    1. What to call the release after v2.1.x: v2.2.x or v3.x (i.e., whether there are backwards compatibility issues or not)
    2. Whether to merge master into the v2.x branch or fork into a new branch (regardless of whether the next release is v2.2.x or v3.x)
  - The consensus seems to be that we think (but we don't know for sure because no one has systematically analyzed) there is both:
    1. A huge amount of code drift from master to v2.x such that a merge may generate tons of conflicts
    2. A bunch of backwards-incompatible changes (e.g., MCA vars and CLI params)
  - Meaning: we think the next release should be v3.x and it should be a fork from master
Migration to new cloud services update for website, database, etc.
- DONE:
  - DNS:
    - All 6 domains transferred to Jeff's GoDaddy account
  - Web site:
    - Migrate www.open-mpi.org to HostGator
    - Install initial LetsEncrypt SSL certificates on www.open-mpi.oreg
    - Submit CSR to U Michigan for 3-year SSL certificates on www.open-mpi.org (thank you, U. Michigan!)
    - rsync web mirroring method shut down
  - Mailing lists:
    - Migrate mailman lists to NMC
    - Freeze old mailing list archives, add to ompi-www git
    - Add old mailing list archives to mail-archive.com
    - Setup new mails to archive to mail-archive.com
  - Email
    - Setup 2 email legacy addresses: rhc@ and jjhursey@
  - Infrastructure
    - Nightly snapshot tarballs being created on RHC's machine and SCPed to www.open-mpi.org
  - Github push notification emails (i.e., "gitdub")
    - Converted Ruby gitdub to PHP
    - Works for all repos... except ompi-www (due to memory constraints)
      - Might well just disable git commit emails for ompi-www
  - Contribution agreements
    - Stored in Google Drive under [email protected] (and shared to a few others)
- Still to-do:
  - Web site:
    - Probably going to shut down the mirroring problem.
    - Possibly host the tarballs at Amazon S3 and put CloudFront in front of them
  - Spin up an Amazon EC instance (thank you Amazon!) for:
    - Hosting Open MPI community Jenkins master
    - Hosting Open MPI community MTT database and web server
  - Revamp / consolidate: ompi master:contrib/ -- there's currently 3 subdirs that should really be disambiguated and overlap removed. Perhaps name subdirs by the DNS name where they reside / operate?
    - infrastructure
    - build server
    - nightly
  - Spend time documenting where everything is / how it is setup
  - Fix OMPI timeline page: https://www.open-mpi.org/software/ompi/versions/timeline.php
    - Gilles submitted a PR: https://github.com/open-mpi/ompi-www/pull/14
    - DONE!
  - Possible umbrella non-profit organization
    - Details to be mailed to devel-core/admin: see http://spi-inc.org/
  - Update Open MPI contrib agreements
    - Created a new contributions@lists. email address, will update agreements
MCA support as a separate package?
- Now that we have multiple projects (PMIx) and others using MCA plugins, does it make sense to create a separate repo/package for MCA itself? Integrating MCA into these projects was modestly painful (e.g., identifying what other infrastructure - such as argv.h/c - needs to be included) - perhaps a more packaged solution will make it simpler.
- Need to "tag" the component libraries with their project name as library confusion is becoming more prevalent as OMPI begins to utilize MCA-based packages such as PMIx
- From the meeting:
  - The need for this has gone down quite a bit: PMIx copied and renamed, Warewulf is going to go python.
  - But it seems worthwhile to take the next few steps in spreading the project name throughout the MCA system:
    - Put the project name in the component filename: mca_PROJECT_FRAMEWORK_COMPONENT.la
    - Add some duplicate-checking code in the MCA var base: if someone sets a value for FRAMEWORK_COMPONENT_VAR, and there's more than one of those (i.e., the same framework/component/var in two different projects, and the project name was not specified), the we need to error and let a human figure it out.
Plans for folding ompi-release Github repo back into ompi Github repo
- https://github.com/open-mpi/ompi/issues/1512
(Possibly) Remove atomics from OBJ_RETAIN/OBJ_RELEASE in the THREAD_SINGLE case.
- @nysal said he would look at this.
- See https://github.com/open-mpi/ompi/issues/1902.
- NTH: This is already done.
Continue --net mpirun CLI option discussion from Feb 2016 meeting
- Originally an IBM proposal.
- Tied to issues of "I just want to use network X" user intent, without needing to educate users on the complexities of PML, MTL, BTL, COLL, ...etc.
- We didn't come to any firm conclusions in February.
- From the meeting:
  - There was a long discussion about this in the meeting; see the meeting minutes for more detail.
MPI_Reduce_Local - move into coll framework.
- From the meeting:
  - It isn't in the coll framework already simply because it isn't a collective.
  - But IBM would like to have multiple backends to MPI_REDUCE_LOCAL
  - The OMPI Way to do this is with a framework / component
  - Seems like overkill to have a new framework just for this one MPI function
  - So it seems ok to add it to the coll framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly