Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaCe Orchestration issues #42

Open
fmalatino opened this issue May 14, 2024 · 1 comment
Open

DaCe Orchestration issues #42

fmalatino opened this issue May 14, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@fmalatino
Copy link
Contributor

Describe the bug
DaCe orchestration failing when unused variables in pySHiELD microphysics module parsed. Specifically the sedimentation and icloud methods were passed unused variables, d0_vap, lv00, and cracw which would result in a ValueError. Variables have been since removed in PR 15

To Reproduce
Replace d0_vap and lv00 into sedimentation method and calls, and cracw into icloud method and calls. Make sure to use a dace backend, with the FV3_DACEMODE environment variable set to BuildAndRun to enable orchestration and running of the model in pace. To start run use:
mpirun -n X python -m pace.run <config yaml>

Expected behavior
DaCe should be able to use orchestration regardless of variable usage.

System Environment
Describe the system environment, include:

  • OS: RHEL 8.9
  • Backend used: dace:cpu
  • Environment variables set: FV3_DACEMODE=BuildAndRun
  • Compiler(s): gcc/12.3.0, python/3.11,
  • MPI type, and version: openmpi/5.0.0
  • netCDF Version: netcdf/4.9.2
  • If this bug came from a model run, which model: baroclinic_c12_orch_cpu.yaml
@fmalatino fmalatino added the bug Something isn't working label May 14, 2024
@FlorianDeconinck
Copy link
Collaborator

Another case popped up of the same issue: unused variables create a parsing issue in orch:dace:X.

Digging the issue lives in the gt4py/dace bridge.

In orchestration the StencilFactory uses a lazy_stencil (code) to defer build at JIT time. This system refers to the DaCeLazyStencil (code).

1/ When DaCe takes over to move the code under the SDFG IR, the __sdfg__ function gets executed. The stencil get packed into an unexpanded version which signature is given here as a simple list of the declared arguments.

2/ Later on as orchestration progresses, DaCe will turn the OIR (GT last IR) into SDFG (here). The bug manifests here where the bridge attempts to build the inputs/outputs using the node.params list.

The bug is that by then the GT pipeline might have culled the parameter that existed in 1/ because it's unused. But 1/ promised to the system this parameter. Meanwhile, because 2/ listens to the OIR the parameter doesn't exist: bug.

To fix we need to have 1/ and 2/ agree. The issue is 1/ doesn't know what kind of operation will happen next and 2/ is logically looking at the "truth" coming out of the pipeline, neither one are wrong per se or have access to the right info.

One way to deal with would be to try and deactivate the optimization pass (skip attribute here] that culls unused parameters, but this might have side effect on performance. A better fix is to see if 1/ or 2/ can be changed without breaking both stencil and orchestration pipeline.

@fmalatino fmalatino changed the title Parsing of unused variables in pySHiELD DaCe Orchestration issues Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants