From f0719650ed214ce542fead2793dfb06cdb8b87ea Mon Sep 17 00:00:00 2001 From: Axel Huebl Date: Fri, 24 Sep 2021 07:23:39 -0700 Subject: [PATCH] Doc: OMPI_MCA_io Control (#1114) Document OpenMPI MPI-I/O backend control. We have documented this long in #446. --- docs/source/backends/hdf5.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/source/backends/hdf5.rst b/docs/source/backends/hdf5.rst index c932f511d2..7b524464c5 100644 --- a/docs/source/backends/hdf5.rst +++ b/docs/source/backends/hdf5.rst @@ -29,6 +29,7 @@ Environment variable Default Description ``OPENPMD_HDF5_CHUNKS`` ``auto`` Defaults for ``H5Pset_chunk``: ``"auto"`` (heuristic) or ``"none"`` (no chunking). ``H5_COLL_API_SANITY_CHECK`` unset Debug: Set to ``1`` to perform an ``MPI_Barrier`` inside each meta-data operation. ``HDF5_USE_FILE_LOCKING`` ``TRUE`` Work-around: Set to ``FALSE`` in case you are on an HPC or network file system that hang in open for reads. +``OMPI_MCA_io`` unset Work-around: Disable OpenMPI's I/O implementation for older releases by setting this to ``^ompio``. ===================================== ========= =========================================================================================================== ``OPENPMD_HDF5_INDEPENDENT``: by default, we implement MPI-parallel data ``storeChunk`` (write) and ``loadChunk`` (read) calls as `none-collective MPI operations `_. @@ -56,6 +57,13 @@ As a result, read-only operations like ``h5ls some_file.h5`` or openPMD ``Series If you are sure that the file was written completely and is closed by the writer, e.g., because a simulation finished that created HDF5 outputs, then you can set this environment variable to ``FALSE`` to work-around the problem. You should also report this problem to your system support, so they can fix the file system mount options or disable locking by default in the provided HDF5 installation. +``OMPI_MCA_io``: this is an OpenMPI control variable. +OpenMPI implements its own MPI-I/O implementation backend *OMPIO*, starting with `OpenMPI 2.x `__ . +This backend is known to cause problems in older releases that might still be in use on some systems. +Specifically, `we found and reported a silent data corruption issue `__ that was fixed only in `OpenMPI versions 3.0.4, 3.1.4, 4.0.1 `__ and newer. +There are also problems in OMPIO with writes larger than 2GB, which have only been fixed in `OpenMPI version 3.0.5, 3.1.5, 4.0.3 `__ and newer. +Using ``export OMPI_MCA_io=^ompio`` before ``mpiexec``/``mpirun``/``srun``/``jsrun`` will disable OMPIO and instead fall back to the older *ROMIO* MPI-I/O backend in OpenMPI. + Selected References -------------------