forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 4
WeeklyTelcon_20160412
Geoff Paulsen edited this page Apr 12, 2016
·
7 revisions
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Jeff Squyres
- Edgar Gabriel
- Howard
- Josh Hursey
- Joshua Ladd
- Nathan Hjelm
- Ralph Castain
- Ryan Grant
- Sylvain Jeaugey
- Todd Kordenbrock
-
Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
- Ralph will look at the ByNode thing today.
- Allena reported a SLURM issue.
- Issue 1530 Reason MTT was hanging was due to a test signal handler segving, cores taking a long time to dump.
- Next 1.10 release. need to fix these issues, but looking like early May.
-
Github Now DOES allow per-branch permissions, so will look at
-
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- 1 remaining blocker: to memory symbol patcher - Nathan / IBM / Mellanox.
- Got the original code to stack with UCX. munmap on Linux has an optional argument. This doesn't work well with any style of hooking. Loader ends up patching a random function address. Didn't understand. Assembly looked okay, but munmap was randomly
- When UCX is involved, seeing both Open MPI and UCX memhooks, which is great.
- SPARC still having issues, so will need a solution for 2.0.1.
- Nathan will work to remove ptmalloc on master, and have build time
- on 2.0.0 Nathan will add a --enable-ptmalloc explicit configure option, but doesn't build by default.
- If users configure --enable-ptmalloc, then it would disable the internal memhook frameworks entirely.
- when this happens, will have to add some early code to tickle ptmalloc
- need to document that if --enable-ptmalloc then munmap() calls may give wrong answers.
- Nathan might due sparc assembly himself, since seeing weird dlsym issues on sparc, that might be related.
- Now created first time creating an rcache.
- Might need to tweak openib BTL, because it created rcache too early.
- When you create a thread, it has to expand the heap sometimes.
- When they expand heap, they protect entire heap with PROT_NOTE.
- munmap, mremap (if new length is smaller than old length), shmdt (only on linux, not OSX), brk.
- need sbrk also (for negative increment).
- Nathan will look at README for memory hook stuff.
- OPENFABRICS should get it's act together and put in something in kernel to alleviate ll of this.
- Question, do we want new prettier ompi_info output. Didn't change parsable output.
- Low risk, got contributor agreement (works for SuSE). Can pull 1515, 1516, 1518 into 2.0.
- 1 remaining blocker: to memory symbol patcher - Nathan / IBM / Mellanox.
-
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
- Absoft failure. Need to fix configure stuff with atomics. On Master.
- Nathan - this failure should go into 2.0.1. Absoft MTT shows that
- IBM has a client facing cluster
- Working on getting Jenkins setup on IBM side to ensure Pull Requests get tested on Power also.
- Hoping to have online this week?
- Better upload interface using by both Ralph and Josh.
- Plugins to support SLURM, Copy tree, Shell commands. Compiler version detection.
- Looking at establishing MTT release .tarballs.
- Timeframe: Sooner rather than later
- Cisco - a bunch of release engineering work for both libfabric and OMPI.
- assisting on a number of bugs.
- NVIDIA - Sylvian - An issue on MTT - looking into.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM