-
Notifications
You must be signed in to change notification settings - Fork 4
Meeting 2016 08
- Start: 9am, Tue Aug 16, 2016
- Finish: 1pm, Thu Aug 18, 2016
- Location: IBM facility, Dallas, TX
- Attendance fee: $50/person, see registration link below
-
Exactly the same setup as in February
- IBM Dallas Innovation Center web site
- Google Maps
- Street address: 1177 South Beltline Road, Coppell, Texas 75019 USA
- Enter on the East entrance (closest to Beltline Road)
- Hollerith Room - On left after you walk in. (Now All 3 days, in the same room!)
- Receptionist should have nametags for everyone.
- Foreign Nationals welcome.
- No need to escort visitors in this area.
Please both register at EventBrite ($50/person) and add your name to the wiki list below if you are coming to the meeting:
- Jeff Squyres, Cisco
- Howard Pritchard, LANL
- Geoffrey Paulsen, IBM
- Ralph Castain, Intel
- George Bosilca, UTK (17 and 18)
- Josh Hursey, IBM
- Edgar Gabriel, UHouston
- Takahiro Kawashima, Fujitsu
- Shinji Sumimoto, Fujitsu
- Brian Barrett, Amazon Web Services
- Nathan Hjelm, LANL
- Sameh Sharkawi, IBM (17 and 18)
- Mark Allen, IBM
- Josh Ladd, Mellanox (17)
- ...please fill in your name here if you're going to attend...
-
Annual git committer audit
-
Plans for v2.1.0 release
- Need community to contribute what they want in v2.1.0
- Want to release by end of 2016 at the latest
-
Present information about IBM Spectrum MPI, processes, etc.
- May have PR's ready to discuss requested changes, but schedule is tight in July / August for us.
-
MTT updates / future direction
-
How to help alleviate "drowning in CI data" syndrome?
- One example: https://github.com/open-mpi/ompi/pull/1801
- One suggestion: should we actively market for testers in the community to help wrangle this stuff?
- If Jenkins detects an error, can we get Jenkins to retry the tests without the PR changes, and then compare the results to see if the PR itself is introducing a new error?
- How do we stabilize Jenkins to alleviate all these false positives?
-
PMIx roadmap discussions
-
Thread-safety design
- Need some good multi-threaded performance tests (per Nathan and Artem discussion)
- Do we need to write them ourselves?
- Review/define the path forward
- Need some good multi-threaded performance tests (per Nathan and Artem discussion)
-
Fujitsu status
- Memory consumption evaluation
- MTT status
- PMIx status
-
Revive btl/openib memalign hooks?
-
Request completion callback and thread safety
-
Discuss appropriate default settings for openib BTL
- Email thread on performance conflicts between RMA/openib and SM/Vader
-
Ralph offers to give presentation on "Flash Provisioning of Clusters", if folks are interested
-
Cleanup of exposed internal symbols (see https://github.com/open-mpi/ompi/pull/1955)
-
Performance Regression tracking
- What do we want to track, and how are we going to do that.
- https://github.com/open-mpi/ompi/issues/1831#issuecomment-229520276
- https://github.com/open-mpi/mtt/issues/445
-
Symbol versioning
- Per request from Debian
-
What to do about MPI_Info PR from IBM / MPI Forum gyrations about MPI_Info?
-
Should we be using Slack.com as a community?
NOTE: Some notes are included below. But a much more detailed writeup can be found in the meeting minutes
-
Status of v2.0.1 release
- Lots of PRs still...
-
From the meeting:
- Closing in on v2.0.1. Most PRs are in. Release next Tuesday (Aug 23, 2016) if possible
-
After v2.1.0 release, should we merge from master to the v2.x branch?
- Only if there are no backwards compatibility issues (!)
- This would allow us to close the divergence/gap from master to v2.x, but keep life in the v2.x series (which is attractive to some organizations)
- Alternatively, we might want to fork and create a new 3.x branch.
-
From the meeting:
- Long discussion. There seems to be two questions:
- What to call the release after v2.1.x: v2.2.x or v3.x (i.e., whether there are backwards compatibility issues or not)
- Whether to merge
master
into thev2.x
branch or fork into a new branch (regardless of whether the next release is v2.2.x or v3.x)
- The consensus seems to be that we think (but we don't know for sure because no one has systematically analyzed) there is both:
- A huge amount of code drift from master to v2.x such that a merge may generate tons of conflicts
- A bunch of backwards-incompatible changes (e.g., MCA vars and CLI params)
- Meaning: we think the next release should be v3.x and it should be a fork from master
- Long discussion. There seems to be two questions:
-
Migration to new cloud services update for website, database, etc.
- DONE:
- DNS:
- All 6 domains transferred to Jeff's GoDaddy account
- Web site:
- Migrate www.open-mpi.org to HostGator
- Install initial LetsEncrypt SSL certificates on www.open-mpi.oreg
- Submit CSR to U Michigan for 3-year SSL certificates on www.open-mpi.org (thank you, U. Michigan!)
- rsync web mirroring method shut down
- Mailing lists:
- Migrate mailman lists to NMC
- Freeze old mailing list archives, add to ompi-www git
- Add old mailing list archives to mail-archive.com
- Setup new mails to archive to mail-archive.com
- Email
- Setup 2 email legacy addresses: rhc@ and jjhursey@
- Infrastructure
- Nightly snapshot tarballs being created on RHC's machine and SCPed to www.open-mpi.org
- Github push notification emails (i.e., "gitdub")
- Converted Ruby gitdub to PHP
- Works for all repos... except ompi-www (due to memory constraints)
- Might well just disable git commit emails for ompi-www
- Contribution agreements
- Stored in Google Drive under [email protected] (and shared to a few others)
- DNS:
- Still to-do:
- Web site:
- Probably going to shut down the mirroring problem.
- Possibly host the tarballs at Amazon S3 and put CloudFront in front of them
- Spin up an Amazon EC instance (thank you Amazon!) for:
- Hosting Open MPI community Jenkins master
- Hosting Open MPI community MTT database and web server
- Revamp / consolidate: ompi master:contrib/ -- there's currently 3 subdirs that should really be disambiguated and overlap removed. Perhaps name subdirs by the DNS name where they reside / operate?
- infrastructure
- build server
- nightly
- Spend time documenting where everything is / how it is setup
- Fix OMPI timeline page: https://www.open-mpi.org/software/ompi/versions/timeline.php
- Gilles submitted a PR: https://github.com/open-mpi/ompi-www/pull/14
- DONE!
- Possible umbrella non-profit organization
- Details to be mailed to devel-core/admin: see http://spi-inc.org/
- Update Open MPI contrib agreements
- Created a new contributions@lists. email address, will update agreements
- Web site:
- DONE:
-
MCA support as a separate package?
- Now that we have multiple projects (PMIx) and others using MCA plugins, does it make sense to create a separate repo/package for MCA itself? Integrating MCA into these projects was modestly painful (e.g., identifying what other infrastructure - such as argv.h/c - needs to be included) - perhaps a more packaged solution will make it simpler.
- Need to "tag" the component libraries with their project name as library confusion is becoming more prevalent as OMPI begins to utilize MCA-based packages such as PMIx
-
From the meeting:
- The need for this has gone down quite a bit: PMIx copied and renamed, Warewulf is going to go python.
- But it seems worthwhile to take the next few steps in spreading the project name throughout the MCA system:
- Put the project name in the component filename:
mca_PROJECT_FRAMEWORK_COMPONENT.la
- Add some duplicate-checking code in the MCA var base: if someone sets a value for
FRAMEWORK_COMPONENT_VAR
, and there's more than one of those (i.e., the same framework/component/var in two different projects, and the project name was not specified), the we need to error and let a human figure it out.
- Put the project name in the component filename:
-
Plans for folding
ompi-release
Github repo back intoompi
Github repo -
(Possibly) Remove atomics from
OBJ_RETAIN
/OBJ_RELEASE
in theTHREAD_SINGLE
case.- @nysal said he would look at this.
- See https://github.com/open-mpi/ompi/issues/1902.
- NTH: This is already done.
-
Continue
--net
mpirun
CLI option discussion from Feb 2016 meeting- Originally an IBM proposal.
- Tied to issues of "I just want to use network X" user intent, without needing to educate users on the complexities of PML, MTL, BTL, COLL, ...etc.
- We didn't come to any firm conclusions in February.
-
From the meeting:
- There was a long discussion about this in the meeting; see the meeting minutes for more detail.
-
MPI_Reduce_Local - move into coll framework.
-
From the meeting:
- It isn't in the coll framework already simply because it isn't a collective.
- But IBM would like to have multiple backends to MPI_REDUCE_LOCAL
- The OMPI Way to do this is with a framework / component
- Seems like overkill to have a new framework just for this one MPI function
- So it seems ok to add it to the coll framework
-
From the meeting: