forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 4
WeeklyTelcon_20160517
Geoff Paulsen edited this page May 17, 2016
·
4 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Brad Benton
- Howard
- Josh Hursey
- Joshua Ladd
- Nathan Hjelm
- Ralph
- Sylvain Jeaugey
- Todd Kordenbrock
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
- 1161 - Open IB Error Path - Giles asked Mike to review, in 2nd iteration.
- Joshua Ladd tagged on 2.x version.
- 1150 - 2 places in Init and 1 in Finalize where we do RTE Barrier.
- If launched with mpirun, it works just fine.
- But direct launch will hang in cray or slurm PMIx because those have Blocking RTE barriers, and those DONT progress.
- Patched it in master with MPI Barrier to make other things progress.
- Will need to block 2.0.x for this fix also. Ralph will create PR.
- Once these get in, Do another RC and move this out.
- Wiki: https://github.com/open-mpi/ompi/wiki/Releasev20
- Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0
- PMIx barrier
- Nathan will review 1164.
- PR 1673 Multi-threaded issues that George ran into is a doozie.
- Free path in C++. In one thread in dereg hooks in Delete.
- Another thread was try to allocate space, and trigerring internal garbage collection.
- Classic deadlock.
- Nathan reworked the rcache / mpool code to not hold lock while doing deletes.
- All locks are always on in RDMA because no way around it.
- Last rcache bug was if you had > 100 registrations associated with memory registration being munmapped, ran into infinite loop.
- Nathan and George testing.
- IBM will do some multi-threaded testing as well.
- PowerPC issues as well. Nathan had to revise table a bit.
- ppc64le, if you do a dlsym, pointer is into table of contents: 1 is real address.
- problem is TOC is getting patched.
- when patching, need to patch the real function, not the other.
- ppc64BE - may still
- ppc64le, if you do a dlsym, pointer is into table of contents: 1 is real address.
- 1162 - multiple threads make same endpoint simultaneously.
- Nathan thought he handled that case.
- one thing we forgot to do for 2.0.0rc2, we forgot to send to users-alias. Will do for rc3.
- Put announcement about Migration guide into Announcement list.
Review Master MTT testing (https://mtt.open-mpi.org/)
- IBM trying to ramp up MTT testing. Hopefully will have Power8 XL compiler testing soon.
- Some issues passing certain flags to XL compilers. Josh Hersey is working on.
- Cisco / Intercomm create failures.
- Getbyte offset test requires v2.0.0 or greater and spins until timeout on 1.10.
- 2nd month of RED. Can't seem to break out of it.
- IBM wants to get Jenkins on Power8LE enabled this week. Looks like got correct permission, using the polling method.
- If people pushes quickly, if multiple pushes between polling interval, it'll just pickup the last.
- Jenkins servers have been hanging / restarting lately.
- Howard saw that there was a cron job doing auto-mated updates of jenkins. Last wednesday jenkins was updated with security fix, but that broken a lot of github integration.
-
Pull Request 1650 still causing red X on Mellanox Jenkins.
- Red X on master, because issue that hasn't been resolved.
- Need nathan or josh hursey or someone to follow. Who knows AMC code the best? We could move AMCA out
- MCA variable system
- envlist being available in an aggregate.
- Jenkins is still the best of the worst for running in non-cloud
- Hudson is enterprise pay-for solution, but we want free
- josh posted documentation on wiki, but not the scripts yet.
- https://github.com/open-mpi/ompi/wiki/PRJenkinsSetupFirewall
- Will be posting a few scripts to pass env vars and manage GitHub Gists to ompi-tests.
- 2 shell scripts, and a perl script.
- MTT some new development to clean out MTT github to MTT devel list.
- Clear out some issues and set a new milestone, etc.
- There is an alternative for Travis, but that hasn't been an issue.
- Appvayer
- What is combinatorial Executor for MTT?
- Ralph explains: if you have two different ompi builds (different configure lines)
- Big list of tests.
- Existing sequential executor would sequentially build both.
- but When building tests, it wouldn't automatically build for both, you have to tell it.
- The Combinatorial executor would do that. Build list of tests for EACH configured OMPI build.
- Chelsio getting some resources to possibly do MTT nightly testing.
- Mellanox, Sandia, Intel
- LANL, Houston, IBM
- Cisco, ORNL, UTK, NVIDIA