forked from open-mpi/ompi
-
Notifications
You must be signed in to change notification settings - Fork 1
WeeklyTelcon_20160419
Geoff Paulsen edited this page Apr 19, 2016
·
5 revisions
-
Dialup Info: (Do not post to public mailing list or public wiki)
-
Date: April 19, 2016
- Todd Kordenbrock
- Geoff Paulsen
- Jeff Squyres
- Howard
- Josh Hursey
- Joshua Ladd
- Nathan Hjelm
- Ralph
- Sylvain Jeaugey
- Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
- Predefined datatype test still failing on 1.10.
- Attribute tests keep failing (Cisco)
- cxxWinAddr test still failing.
- OMNI-path issue. Ralph supplied a matias.diff patchfile, but Howard could not get it to work.
- issue seems to be with PSM2 on 1 node creating endpoints.
- Error signal handler in psm libraries. Nothing we can do at OMPI layer.
- Next 1.10 release. need to fix these issues, but looking like early May.
- Jeff and Ralph proposing Stopping the 1.10 series after this release, if we can get 2.0 out.
-
Blocker Issues: https://github.com/open-mpi/ompi/issues?utf8=%E2%9C%93&q=is%3Aopen+milestone%3Av2.0.0+label%3Ablocker
- Still at 1 remaining blocker: to memory symbol patcher - Nathan / IBM / Mellanox.
- Everything looking good, but FreeBSD offers a slightly different ELF format in their elf.h
- Nathan disable for FreeBSD. Folks just using it for dev / TCP. no RDMA.
- Absoft - hitting a compiler error with GCC 4.1.2.
- On some systems, something is overriding the default. most customers on x86 or Power so, not biggie
- Fujitsu will care someday
- Everyone happy? yes because they all stack with UCX.
- Howard did you add a test to MTT to stress this code path? - Mark gave Nathan a test case he added to IBM
- Extra optional to mremap() on Linux. When MREMAPFIXED
- Nathan made the last argument explicit.
- SPARC still having issues, so will need a solution for 2.0.1.
- Have some time since fujitsu isn't moving to 2.x until later this year.
- Will disable leave-pinned on sparc.
- Nathan will work to remove ptmalloc on master
- Checked into Master.
- on 2.0.0 Nathan will add a --enable-ptmalloc explicit configure option, but doesn't build by default.
- If users configure --enable-ptmalloc, then it would disable the internal memhook frameworks entirely.
- when this happens, will have to add some early code to tickle ptmalloc
- need to document that if --enable-ptmalloc then munmap() calls may give wrong answers.
- Nathan will look at README for memory hook stuff.
- The Late opening of mpool has been held off, because ptmalloc still optional on 2.0.0
- rest of master stuff will be pulled over in 2.1.0
- Nathan if he can do NEWS, that'd be great, otherwise Jeff and Howard will get it in.
- OPENFABRICS should get it's act together and put in something in kernel to alleviate ll of this.
- Question, do we want new prettier ompi_info output. Didn't change parsable output.
- Low risk, got contributor agreement (works for SuSE). Can pull 1515, 1516, 1518 into 2.0.
- Timeframe? If nathan gets stuff in today, then will make an RC in next couple of days.
- Still at 1 remaining blocker: to memory symbol patcher - Nathan / IBM / Mellanox.
-
Milestones: https://github.com/open-mpi/ompi-release/milestones/v2.0.0 *
- Started looking kinda clean other than other stuff.
- Allgather issue on intra-collectives.
- Merge GITHUB projects/repos into one with master and release branches, with restricted permissions on release branches.
- After 2.0.0 release, will look at logistics of merging two repos.
- delayed - Permissions for tagging, and pushing branches? etc?
- delayed - Always use PULL request? or keep master so anyone can push?
- Noah at Intel - added some features to python client. Fixed slurm.
- IBM has a client facing cluster
- IBM Cluster submitting to Open cluster
- Switch these over to be publicly viewable this week.
- Power 8 set of machines.
- will also have LSF. Has LSF, had some issues
- Hopefully will have Jenkin's pull requesting working.
- An issue with where credentials have to live. Josh is negotiating.
- Having access to that version of Jenkins has been useful.
- If 'we' can't see console output. Mellanox community CAN see console output.
- Community would not be happy if can't see output.
- Had an idea to push output to a GITHUB jist
- Working on getting Jenkins setup on IBM side to ensure Pull Requests get tested on Power also.
- Hoping to have online this week?
- Looking at establishing MTT release .tarballs.
- Intel guys are looking at how to "release" MTT. MTT does not lend itself well to a "release".
- Because they want to include it in HPC Cluster testing project.
- How will PMIx 1.1.4 be moved into 2.0.1?
- Mellanox - nothing other than the usual scrambling, Hopefully got the OSHMEM issues resolved.
- Do you know if once those things go into master, if they will be v1.3 compliant?
- Multi-subnet routing been delayed to 2.1.
- Josh instituted some additional processes and proceedures for his team.
- HPC / Mellanox remove Mike Dubman and use Josh Ladd
- Sandia - add the moment a dependency in One Sided Component.
- Looking at creating an OMPI Portals component for all of that to exist in.
- Have a collective component with a handful of Portals collectives. Figuring out which best ones to add.
- When to add new collective components? 2.0.1 or 2.1?
- 2.0.1 would be focused on bug fixes.
- 2.1.0 will include both features / bugfixes.
- Intel - PMIx stuff.
- Started SCON - Scalable Overlay Network Project.
- eventually make it a configurable option to the PMIx library (for PMIx to use for communication)
- will have ____ lots of components, and collective options.
- Would port whatever makes sense to ORTE.
- Working with HPC stack guys, for communicating different elements, etc
- Started SCON - Scalable Overlay Network Project.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM