Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use JSON/TOML template for defining openPMD metadata in a config file #1277

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented May 17, 2022

Not relevant for next release

  • Until now: Simulations specify their metadata in-code via API calls
  • With this PR: In some workflows (e.g. experiments) there is no omniscient simulation, but metadata is instead input by the experimentors via configuration files, using the API is not a good workflow for that

Idea: We already have a JSON backend, use an openPMD-conforming JSON dataset to define only metadata. With this, the configuration file will be just another openPMD dataset.
Then, add some functionality to initialize an empty Series from such a metadata file.

TODO:

https://github.com/franzpoeschel/openPMD-api/compare/topic-json-short-modes..topic-json-template

@franzpoeschel franzpoeschel added backend: JSON api: new additions to the API labels May 17, 2022
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch from f10fc90 to c63c06a Compare May 18, 2022 12:26
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented May 18, 2022

An openPMD dataset in TOML:

[platform_byte_widths]
USHORT = 2
ULONG = 8
BOOL = 1
CLONG_DOUBLE = 32
LONGLONG = 8
CFLOAT = 8
CHAR = 1
DOUBLE = 8
CDOUBLE = 16
SHORT = 2
UCHAR = 1
FLOAT = 4
INT = 4
ULONGLONG = 8
UINT = 4
LONG = 8
LONG_DOUBLE = 16

[data]

[data.0]

[data.0.meshes]

[data.0.meshes.E]

[data.0.meshes.E.x]
datatype = "FLOAT"
data = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

[data.0.meshes.E.x.attributes]

[data.0.meshes.E.x.attributes.unitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.meshes.E.x.attributes.position]
value = [0.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes]

[data.0.meshes.E.attributes.timeOffset]
value = 0.0
datatype = "FLOAT"

[data.0.meshes.E.attributes.gridUnitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.meshes.E.attributes.gridSpacing]
value = [1.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes.gridGlobalOffset]
value = [0.0]
datatype = "VEC_DOUBLE"

[data.0.meshes.E.attributes.unitDimension]
value = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
datatype = "ARR_DBL_7"

[data.0.meshes.E.attributes.geometry]
value = "cartesian"
datatype = "STRING"

[data.0.meshes.E.attributes.dataOrder]
value = "C"
datatype = "STRING"

[data.0.meshes.E.attributes.axisLabels]
value = ["x"]
datatype = "VEC_STRING"

[data.0.attributes]

[data.0.attributes.timeUnitSI]
value = 1.0
datatype = "DOUBLE"

[data.0.attributes.time]
value = 0.0
datatype = "DOUBLE"

[data.0.attributes.dt]
value = 1.0
datatype = "DOUBLE"

[attributes]

[attributes.softwareVersion]
value = "0.15.0-dev"
datatype = "STRING"

[attributes.software]
value = "openPMD-api"
datatype = "STRING"

[attributes.openPMDextension]
value = 0
datatype = "UINT"

[attributes.meshesPath]
value = "meshes/"
datatype = "STRING"

[attributes.iterationFormat]
value = "many_iterations_%T"
datatype = "STRING"

[attributes.iterationEncoding]
value = "fileBased"
datatype = "STRING"

[attributes.openPMD]
value = "1.1.0"
datatype = "STRING"

[attributes.date]
value = "2022-05-18 12:20:23 +0000"
datatype = "STRING"

[attributes.basePath]
value = "/data/%T/"
datatype = "STRING"

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 3 times, most recently from 0d475a5 to 1a23a03 Compare May 19, 2022 11:54
@franzpoeschel
Copy link
Contributor Author

franzpoeschel commented May 19, 2022

This is now a simplified TOML openPMD template, created by {"json":{"mode": "template"}}:

[data]

[data.meshes]

[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.temperature.attributes]
timeOffset = 0.0
# Explicit datatype can still be used if needed
unitSI = {"value" = 1.0, "datatype" = "FLOAT"}
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0

[attributes]
softwareVersion = "0.15.0-dev"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 11:55:07 +0000"
basePath = "/data"

Differences to regular JSON/TOML openPMD datasets:

  1. Platform byte width table is missing
  2. Attributes don't explicitly store their datatypes, datatypes are dynamically (and a bit heuristically) restored from what is there.
  3. No actual datasets can be written, instead just the extent is stored.

Template mode is also available in json:

{
  "attributes": {
    "basePath": "/data",
    "date": "2022-05-19 12:00:09 +0000",
    "iterationEncoding": "variableBased",
    "iterationFormat": "/data",
    "meshesPath": "meshes/",
    "openPMD": "1.1.0",
    "openPMDextension": 0,
    "software": "openPMD-api",
    "softwareVersion": "0.15.0-dev"
  },
  "data": {
    "attributes": {
      "dt": 1,
      "snapshot": 0,
      "time": 0,
      "timeUnitSI": 1
    },
    "meshes": {
      "temperature": {
        "attributes": {
          "axisLabels": [
            "x"
          ],
          "dataOrder": "C",
          "geometry": "cartesian",
          "gridGlobalOffset": [
            0
          ],
          "gridSpacing": [
            1
          ],
          "gridUnitSI": 1,
          "position": [
            0
          ],
          "timeOffset": 0,
          "unitDimension": [
            0,
            0,
            0,
            0,
            0,
            0,
            0
          ],
          "unitSI": 1
        },
        "datatype": "FLOAT",
        "extent": [
          5,
          5
        ]
      }
    }
  }
}

@franzpoeschel
Copy link
Contributor Author

Longer example:

[data]

[data.particles]

[data.particles.e]

[data.particles.e.positionOffset]

[data.particles.e.positionOffset.z]

[data.particles.e.positionOffset.z.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.y]

[data.particles.e.positionOffset.y.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.x]

[data.particles.e.positionOffset.x.attributes]
value = 3.14
unitSI = 1.0
shape = [5, 5]

[data.particles.e.positionOffset.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0

[data.particles.e.position]

[data.particles.e.position.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.z.attributes]
unitSI = 1.0

[data.particles.e.position.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.y.attributes]
unitSI = 1.0

[data.particles.e.position.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.position.x.attributes]
unitSI = 1.0

[data.particles.e.position.attributes]
unitDimension = [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
timeOffset = 0.0

[data.particles.e.particlePatches]

[data.particles.e.particlePatches.numParticlesOffset]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.numParticlesOffset.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.numParticles]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.numParticles.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset]

[data.particles.e.particlePatches.offset.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.z.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.y.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.offset.x.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.offset.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

[data.particles.e.particlePatches.extent]

[data.particles.e.particlePatches.extent.z]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.z.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.y]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.y.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.x]
extent = [5, 5]
datatype = "FLOAT"

[data.particles.e.particlePatches.extent.x.attributes]
unitSI = 1.0

[data.particles.e.particlePatches.extent.attributes]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

[data.meshes]

[data.meshes.temperature]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.temperature.attributes]
timeOffset = 0.0
unitSI = 1.0
position = [0.0]
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.meshes.E]

[data.meshes.E.z]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.z.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.y]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.y.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.x]
extent = [5, 5]
datatype = "FLOAT"

[data.meshes.E.x.attributes]
unitSI = 1.0
position = [0.0]

[data.meshes.E.attributes]
timeOffset = 0.0
gridUnitSI = 1.0
gridSpacing = [1.0]
gridGlobalOffset = [0.0]
unitDimension = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
geometry = "cartesian"
dataOrder = "C"
axisLabels = ["x"]

[data.attributes]
timeUnitSI = 1.0
snapshot = 0
time = 0.0
dt = 1.0

[attributes]
softwareVersion = "0.15.0-dev"
particlesPath = "particles/"
software = "openPMD-api"
openPMDextension = 0
meshesPath = "meshes/"
iterationFormat = "/data"
iterationEncoding = "variableBased"
openPMD = "1.1.0"
date = "2022-05-19 15:26:37 +0000"
basePath = "/data"

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 5 times, most recently from 475be7b to ee8bdf1 Compare May 23, 2022 11:14
@franzpoeschel franzpoeschel changed the title Use JSON/TOML template for defining openPMD metadata in a config file [WIP] Use JSON/TOML template for defining openPMD metadata in a config file May 23, 2022
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 3 times, most recently from d32fff3 to 376bc2a Compare July 5, 2022 09:23
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from 3ee509d to a662865 Compare July 21, 2022 16:35
@franzpoeschel
Copy link
Contributor Author

Notes for myself on the recent reodering of commits:

5 3ee509de (HEAD -> topic-json-template, origin/topic-json-template) Properly deal with undefined datasets
2 06da2d58 Make JSON and TOML look like two different backends
5 960ab21a Initialize Dataset definitions from template
5 b88bae67 Initialize Series attributes from template
3 6302a33c Fix NVHPC Toml11 open mode
2 d825008b Fix precision-losing type conversion
4 da960a23 Enable .toml tests in generic tests
4 0398b86f Extend example
3 7332996e Windows compatibility
x 85527799 Add and use Attribute::getOptional<T>()
1 64cde966 Template mode: Fill with zero upon read
1 fa483843 Write/read shorthand attributes without explicit datatype
3 bd8da013 CI fixes
1 d802d2ac Don't write platform datatype size table in template mode
2 cba71f7f Use .toml as filename extension
2 b019a7d1 TOML as alternative backend for JSON backend
1 4b25de8c Select template mode via JSON param
1 8ef4753f Add template mode to JSON backend

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from 1db63f6 to 55b72f8 Compare July 29, 2022 09:00
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from b07a2a8 to a41c2c6 Compare August 17, 2022 09:21
// throw error::WrongAPIUsage(
// "[RecordComponent] Must set specific datatype (Use "
// "resetDataset call).");
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this check was inactive up to now, since RecordComponentData::RecordComponentData initialized that field with Datatype::CHAR. Using an optional would make these things more obvious and avoid such pitfalls.
To be done in a different PR though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1316 now uses std::optional

@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 3 times, most recently from 1307f10 to b5eecbb Compare February 29, 2024 13:34
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from 2f6b7d1 to 68dc5e5 Compare March 26, 2024 11:49
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch from 68dc5e5 to fc1578c Compare May 14, 2024 14:30
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch from fc1578c to 2249002 Compare May 30, 2024 08:17
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch from 2249002 to b11d83c Compare June 7, 2024 12:37
@@ -1560,12 +1618,13 @@
}
};

inline void write_test(const std::string &backend)
inline void write_test(

Check warning

Code scanning / CodeQL

Poorly documented large function Warning test

Poorly documented function: fewer than 2% comments for a function of 134 lines.
@franzpoeschel franzpoeschel force-pushed the topic-json-template branch 2 times, most recently from df04b3e to 074a93d Compare August 5, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: new additions to the API backend: JSON
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants