Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 8: Orchestrator Repo + CLI #37

Closed
wants to merge 2 commits into from

Conversation

cisaacstern
Copy link
Member

This ADR proposes a new pangeo-forge-orchestrator repo which aims to address our challenges re: visibility of the relationships between Pangeo Forge's modular components

as well as the lack of a single entry point from which to invoke them

A major aim of this ADR which is perhaps not yet fully articulated in the PR itself is to improve the maintainability and extensibility of our contribution workflow. As roughly documented in flow-charts/ci-flow-with-callstack.png, the automated components of our CI are spread out across a range of different GitHub Actions and other repos. This would bring them all under one roof (from an interface standpoint; other repos/packages may still be called deeper in the stack).

From a design perspective, I imagine the implementation building from the design patterns established by @andersy005 in pangeo-forge/pangeo-forge-recipes#69 (including the use of typer and rich.table, etc.).

I'll start a draft of this repo today to experiment with some ideas.


## Consequences

What becomes easier or more difficult to do because of this change?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to have a list of other repos / areas of Pangeo Forge which will be affected or subsumed by this new unified CLI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Made a first pass at this in 24fa37b

@cisaacstern
Copy link
Member Author

pangeo-forge/pangeo-forge-orchestrator#1 illustrates some initial ideas for what the orchestrator interface might look like.

Any high-level process orchestration (of, e.g., cataloging) must be able to introspect the relationships between various components of Pangeo Forge (e.g., tie feedstocks back to their resulting datasets). Theoretically, storage paths should/will encode the feedstock names, but this is an incomplete solution because:

  1. The particular encoding strategy will inevitably be adjusted over time
  2. Even an idealized unchanging encoding strategy may not encode information such as dataset minor versions and the name of the specific Python recipe object (within the recipe.py module) used to build a dataset

An API as imagined by #31 may be the eventual solution to this, but conversation yesterday with @rabernat persuaded me that a lightweight JSON object (or objects) housed at the storage location could get us up-and-running more agilely. As described in pangeo-forge/pangeo-forge-orchestrator#1 (comment), I'm provisionally calling this "sidecar" (Ryan's term) object build-logs.json.

This will require its own ADR if we move forward with it, but for now I'm going to just continue experimenting with the idea to suss out if/how it might work for us.

@abarciauskas-bgse
Copy link
Contributor

closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants