Get the pipeline running again in some kind of scheduled job (either Glue or another framework) #99

jeancochrane · 2024-02-28T19:27:20Z

The Glue pipeline is currently not usable because it can't read the new parameters introduced to the YAML file in #90. It also is currently run in a separate script from the manual flagging, and so code changes from #90 would need to be translated from manual_flagging/flagging.py to glue/flagging.py before the Glue job could even use the new parameters.

If we want to run flagging jobs on Glue going forward, we need to make the necessary updates to the Glue script (and possibly consolidate it with the manual script per #92) to get it running again. The biggest challenge here is figuring out how to represent the new parameters in a way that will be easily configurable as Glue parameters; my thinking is that we can add a small script to the CI pipeline that parses the YAML file and serialize the parameters to a format that works for Glue, and then we pass those into Terraform as variables. The script can then deserialize the parameters when running in a Glue context, or read them from the YAML file directly when running locally. (I recognize this is a lot of design info, so if we go in this direction I'll take some time to sketch this out in more detail.)

But before we fix up the Glue pipeline, we need to decide if we even want to continue using Glue to run the pipeline on a schedule. Glue's approach to parameters is finicky and it doesn't work well with our GitHub CI flow. So unfortunately this issue will be blocked until we make a decision about architecture.

The text was updated successfully, but these errors were encountered:

jeancochrane added bug Something isn't working blocked labels Feb 28, 2024

jeancochrane added this to the Sales val improvements milestone Feb 28, 2024

jeancochrane mentioned this issue Feb 28, 2024

Sales val flagging improvements and publication #103

Open

4 tasks

dfsnow assigned wagnerlmichael and jeancochrane Apr 16, 2024

dfsnow removed this from the Sales val wrap-up and publication milestone Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get the pipeline running again in some kind of scheduled job (either Glue or another framework) #99

Get the pipeline running again in some kind of scheduled job (either Glue or another framework) #99

jeancochrane commented Feb 28, 2024

Get the pipeline running again in some kind of scheduled job (either Glue or another framework) #99

Get the pipeline running again in some kind of scheduled job (either Glue or another framework) #99

Comments

jeancochrane commented Feb 28, 2024