Merge pull request #3919 from franzpoeschel/disable-hdf5-chunking

Disable hdf5 chunking by default
ComputationalRadiationPhysics · Nov 26, 2021 · 2e63df5 · 2e63df5
2 parents ec27288 + 0ca679f
commit 2e63df5
Show file tree

Hide file tree

Showing 3 changed files with 43 additions and 0 deletions.
diff --git a/docs/source/usage/plugins/openPMD.rst b/docs/source/usage/plugins/openPMD.rst
@@ -143,6 +143,31 @@ Backend-specific notes
 HDF5
 ====
 
+
+Chunking
+""""""""
+
+By default, the openPMD-api uses a heuristic to automatically set an appropriate `dataset chunk size <https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/>`_.
+In combination with some MPI-IO backends (e.g. ROMIO), this has been found to cause crashes.
+To avoid this, PIConGPU overrides the default choice and deactivates HDF5 chunking in the openPMD plugin.
+
+If you want to use chunking, you can ask for it via the following option passed in ``--openPMD.json``:
+
+.. code-block:: json
+
+  {
+    "hdf5": {
+      "dataset": {
+        "chunks": "auto"
+      }
+    }
+  }
+
+In that case, make sure not to use an MPI IO backend that conflicts with HDF5 chunking, e.g. by removing lines such as ``export OMPI_MCA_io=^ompio`` from your batch scripts.
+
+Performance tuning on Summit
+""""""""""""""""""""""""""""
+
 In order to avoid a performance bug for parallel HDF5 on the ORNL Summit compute system, a specific version of ROMIO should be chosen and performance hints should be passed:
 
 .. code-block:: bash
@@ -167,6 +192,7 @@ Performance
 On the Summit compute system, specifying ``export IBM_largeblock_io=true`` disables data shipping, which leads to reduced overhead for large block write operations.
 This setting is applied in the Summit templates found in ``etc/picongpu/summit-ornl``.
 
+
 Memory Complexity
 ^^^^^^^^^^^^^^^^^
 

diff --git a/include/picongpu/plugins/openPMD/Json.cpp b/include/picongpu/plugins/openPMD/Json.cpp
@@ -152,6 +152,9 @@ namespace picongpu
                 }
                 result[backend.backendName]["dataset"] = datasetConfig;
             }
+            // note that at this point, config[<backend>][dataset] is no longer
+            // a list, the list has been resolved by the previous loop
+            addDefaults(result);
             return result.dump();
         }
 
@@ -332,6 +335,18 @@ The key 'select' must point to either a single string or an array of strings.)EN
             throw std::runtime_error(errorMsg);
         }
     }
+
+    void addDefaults(nlohmann::json& config)
+    {
+        // disable HDF5 chunking as it can conflict with MPI-IO backends
+        {
+            auto& hdf5Dataset = config["hdf5"]["dataset"];
+            if(!hdf5Dataset.contains("chunks"))
+            {
+                hdf5Dataset["chunks"] = "none";
+            }
+        }
+    }
 } // namespace
 
 #endif // ENABLE_OPENPMD
diff --git a/include/picongpu/plugins/openPMD/Json_private.hpp b/include/picongpu/plugins/openPMD/Json_private.hpp
@@ -269,4 +269,6 @@ namespace
         std::vector<picongpu::json::Pattern>& patterns,
         nlohmann::json& defaultConfig,
         nlohmann::json const& object);
+
+    void addDefaults(nlohmann::json&);
 } // namespace