Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create prototype for YAML to python translation #434

Open
asewnath opened this issue Sep 25, 2024 · 4 comments
Open

Create prototype for YAML to python translation #434

asewnath opened this issue Sep 25, 2024 · 4 comments
Assignees

Comments

@asewnath
Copy link
Contributor

Making a note to create prototype Python class(es) to replace configuration YAML files.

Recommendations:
https://docs.python.org/3/library/dataclasses.html
pydantic data model https://docs.pydantic.dev/latest/

@asewnath asewnath self-assigned this Sep 25, 2024
@rtodling
Copy link
Contributor

@ashiklom can I ask you to give a simply example of how this would replace a yaml. Can you choose a case of a known yaml we have in the configuration on SWELL and illustrate (cryptically is ok) how the thing would work ... I am under the impression, looking at the sites you point to above, that this is more complicated the managing the yamls directly - but I am sure I am missing something - sorry.

@ashiklom
Copy link
Collaborator

The name here is somewhat misleading. I'm not demanding that we replace all (or even any) of our YAMLs with pure Python implementations. I'm just asking for some basic offline prototypes so we can compare the designs and see if there are any obvious advantages.

That said, I'll try to come up with some basic examples to give you a sense of how this might look.

@asewnath
Copy link
Contributor Author

@ashiklom @rtodling Apologies for the misleading name! As Alexey said, it's meant to be a prototype implementation.

@asewnath asewnath changed the title Replace YAMLS with Python classes Create prototype for YAML to python translation Sep 27, 2024
@ashiklom
Copy link
Collaborator

ashiklom commented Sep 27, 2024

@rtodling Definitely needs more careful thought, but here are some quick sketches, based on the geos_marine and geos_ocean JEDI interfaces.

The YAMLs:

# geos_ocean.yaml
jedi_interface: soca
total_processors: {{total_processors}}
executables:
  hofx3D: soca_hofx3d.x
  hofx4D: soca_hofx.x
  variational3D: soca_var.x
  variational4D: soca_var.x
  variational4DEnsVar: soca_var.x
  explicit_diffusion: soca_error_covariance_toolbox.x
variables:
  hocn: h
  socn: Salt
  ssh: ave_ssh
  tocn: Temp
# geos_marine.yaml
jedi_interface: soca
total_processors: {{total_processors}}
executables:
  hofx3D: soca_hofx3d.x
  hofx4D: soca_hofx.x
  variational3D: soca_var.x
  variational4D: soca_var.x
  variational4DEnsVar: soca_var.x
  explicit_diffusion: soca_error_covariance_toolbox.x
  convert_state_soca2cice: soca_convertstate.x
variables:
  hocn: h
  socn: Salt
  ssh: ave_ssh
  tocn: Temp

Now, a sketch of a pure Python representation, with (basic!) usage of dataclasses.

from dataclasses import dataclass

# Define the general structure.
# Note: A nice thing here is we can use type hinting to precisely define what
# kinds of values are allowed in the config.
@dataclass
class JediInterface:
    jedi_interface: str
    executables: dict[str, str]
    variables: dict[str, str]
    # Include a default value for this field, so we don't always have to set it
    total_processors: int = 1

# Now (perhaps in a separate file), you define specific instances of that structure.

swell_config = get_swell_config(...)

geos_ocean_interface = JediInterface(
    jedi_interface = "soca",
    # Can set things directly via variables
    total_processors = swell_config.total_processors,
    executables = {
        "hofx3D": "soca_hofx3d.x",
        "hofx4D": "soca_hofx.x",
        "variational3D": "soca_var.x",
        "variational4D": "soca_var.x",
        "variational4DEnsVar": "soca_var.x",
        "explicit_diffusion": "soca_error_covariance_toolbox.x"
    },
    variables = {
        "hocn": "h",
        "socn": "Salt",
        "ssh": "ave_ssh",
        "tocn": "Temp"
    }
)

# We can dynamically update instances of the classes based on conditions.
# Maybe a bad example, but, if doing tier 1 tests, reduce the complexity...
if swell_config.is_tier1_test:
    # Only use 1 processor
    geos_ocean_interface.total_processors = 1
    # ...and only consider two variables
    geos_ocean_interface.variables = {
        key: geos_ocean_interface.variables[key]
        for key in ("Temp", "ave_ssh")
    }

# Share information between interfaces. Less typing, less stuff to update, and
# clearer relationships between different interfaces.
ocean_vars = {
    "hocn": "h",
    "socn": "Salt",
    "ssh": "ave_ssh",
    "tocn": "Temp"
}

common_execs = {
        "hofx3D": "soca_hofx3d.x",
        "hofx4D": "soca_hofx.x",
        "variational3D": "soca_var.x",
        "variational4D": "soca_var.x",
        "variational4DEnsVar": "soca_var.x",
        "explicit_diffusion": "soca_error_covariance_toolbox.x"
}

geos_ocean_interface = JediInterface(
    jedi_interface = "soca",
    # Some dynamically-computed value, just to show off...
    # Here, cap the number of processors at 10
    total_processors = min(swell_config.total_processors, 10),
    executables = common_execs,
    variables = ocean_vars
)

geos_marine_interface = JediInterface(
    jedi_interface = "soca",
    # Don't cap this one...
    total_processors = swell_config.total_processors,
    executables = {
        # Same as geos_ocean...
        **common_execs,
        # ...except also add one more
        "convert_state_soca2cice": "soca_convertstate.x"
    },
    # Same as geos_marine
    variables = ocean_vars
)

You can also get fancier (more precise) with your class definition, if you want to, e.g., restrict specific fields to specific values. For example, the apparently Pythonic thing to do (which, I admit, still looks a bit ugly and verbose to me, but a lot of serious Python people recommend it!) is to use enums instead of strings wherever a parameter can take on one of a few specific values:

from dataclasses import dataclass
from enum import StrEnum

# Specify that there are only 2 valid types of JEDI interface...
class JediInterfaceType(StrEnum):
    SOCA = "soca"
    FV3 = "fv3-jedi"

# ...and only 8 possible keys for your executable.
class ExecutableKey(StrEnum):
    HOFX3D = "hofx3D"
    HOFX4D = "hofx4D"
    VAR3D = "variational3D"
    VAR4D = "variational4D"
    VAR4DENS = "variational4DEnsVar"
    EXPLICIT_DIFFUSION = "explicit_diffusion"
    LOCAL_ENS_DA = "localensembleda"
    ENS_VAR = "ensemble_variance"

# Now, restrict the interface and executable keys in a JediInterface to *only* the values defined above.
@dataclass
class JediInterface:
    jedi_interface: JediInterfaceType
    executables: dict[ExecutableKey, str]
    variables: dict[str, str]
    total_processors: int = 1

# Define an instance. This is code that a type checker will pass without errors.
valid_interface = JediInterface(
    jedi_interface = JediInterfaceType.SOCA,
    executables = {ExecutableKey.HOFX3D: "soca_hofx3d.x"},
    variables = {"hofcn": "h"}
)

# NOTE: This will run, but a type checker will raise ArgumentType errors about
# jedi_interface and executables.
bad_interface = JediInterface(
    jedi_interface = "fv4",
    executables = {"hofx5D": "no_such_thing.x"},
    variables = {"hofcn": "h"}
)

Python itself will not enforce dataclass types at runtime, so this would rely on a type checker like mypy or pyright to catch type errors (which is still useful! We catch those errors before we even run anything!).

Pydantic (linked above) data classes look very similar, except that they actually will throw meaningful runtime errors. if you do the wrong thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants