Skip to content

Commit

Permalink
Merge pull request #2 from EOEPCA/develop
Browse files Browse the repository at this point in the history
Merge develop into main
  • Loading branch information
fabricebrito authored Jan 15, 2025
2 parents e599fde + 9359394 commit e22acac
Show file tree
Hide file tree
Showing 19 changed files with 2,020 additions and 743 deletions.
47 changes: 47 additions & 0 deletions .github/workflows/package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Publish to Test PyPI

on:
push:
branches: ["develop", "main"]
release:
types: [published]

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-latest

environment: release
permissions:
id-token: write # IMPORTANT: this permission is mandatory for trusted publishing

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pip'
- name: Install dependencies
run: |
# python -m pip install --upgrade pip
pip install hatch
- name: Build package
run: hatch build
# - name: Test package
# run: hatch -e test run nose2 --verbose
- name: Publish package distributions to Test PyPI
if: github.ref != 'refs/heads/main'
uses: pypa/gh-action-pypi-publish@release/v1
with:
skip-existing: true
repository-url: https://test.pypi.org/legacy/
- name: Publish package distributions to PyPI
if: github.ref == 'refs/heads/main'
uses: pypa/gh-action-pypi-publish@release/v1
with:
skip-existing: true
repository-url: https://upload.pypi.org/legacy/
135 changes: 129 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,132 @@ Zoo runner using Argo Workflows

## Environment variables

STORAGE_CLASS
DEFAULT_VOLUME_SIZE
DEFAULT_MAX_CORES
DEFAULT_MAX_RAM
ARGO_WF_ENDPOINT
ARGO_WF_TOKEN
- `STORAGE_CLASS`: k8s cluster RWX storage class, defaults to `standard`.
- `DEFAULT_VOLUME_SIZE`: Calrissian default RWX volume size, defaults to `12Gi`.
- `DEFAULT_MAX_CORES`: Calrissian default max cores, defaults to `4`.
- `DEFAULT_MAX_RAM`: Calrissian default max RAM, defaults to `4Gi`.
- `ARGO_WF_ENDPOINT`: this is the Argo Workflows API endpoint, defaults to `"http://localhost:2746"`.
- `ARGO_WF_TOKEN`: this is the Argo Workflows API token that can be retrieved with: `kubectl get -n ns1 secret argo.service-account-token -o=jsonpath='{.data.token}' | base64 --decode`
- `ARGO_WF_SYNCHRONIZATION_CM`: this is the Argo Workflows synchronizaion configmap (with key "workflow"). For tests, we use "semaphore-argo-cwl-runner"
- `ARGO_CWL_RUNNER_TEMPLATE`: this is the Argo Workflows WorkflowTemplate that runs the CWL, defaults to: "argo-cwl-runner"
- `ARGO_CWL_RUNNER_ENTRYPOINT`: this is the Argo Workflows WorkflowTemplate entrypoint, defaults to: "calrissian-runner"

## Requirements

The Argo Workflows deployment has a Argo Workflows `WorkflowTemplate` or `ClusterWorkflowTemplate` impllementing the execution of a Calrissian Job and exposing the interface:

**Input parameters:**

```yaml
templates:
- name: calrissian-runner
inputs:
parameters:
- name: parameters
description: Parameters in JSON format
- name: cwl
description: CWL document in JSON format
- name: max_ram
default: 8G
description: Max RAM (e.g. 8G)
- name: max_cores
default: '4'
description: Max cores (e.g. 4)
- name: entry_point
description: CWL document entry_point
```
**Outputs:**
```yaml
outputs:
parameters:
- name: results
valueFrom:
parameter: '{{steps.get-results.outputs.parameters.calrissian-output}}'
- name: log
valueFrom:
parameter: '{{steps.get-results.outputs.parameters.calrissian-stderr}}'
- name: usage-report
valueFrom:
parameter: '{{steps.get-results.outputs.parameters.calrissian-report}}'
- name: stac-catalog
valueFrom:
parameter: '{{steps.stage-out.outputs.parameters.stac-catalog}}'
- name: feature-collection
valueFrom:
parameter: >-
{{steps.feature-collection.outputs.parameters.feature-collection}}
artifacts:
- name: tool-logs
from: '{{steps.get-results.outputs.artifacts.tool-logs}}'
- name: calrissian-output
from: '{{steps.get-results.outputs.artifacts.calrissian-output}}'
- name: calrissian-stderr
from: '{{steps.get-results.outputs.artifacts.calrissian-stderr}}'
- name: calrissian-report
from: '{{steps.get-results.outputs.artifacts.calrissian-report}}'
```
Where:
- `results` is the Calrissian job stdout
- `log` is the Calrissian job stderr
- `usage-report` is the Calrissian usage report
- `stac-catalog` is the s3 path to the published STAC Catalog
- `feature-collection` is the Feature Collection with the STAC Items produced

And the artifacts:

- `tool-logs` is the Calrissian CWL step logs defined as:


```yaml
artifacts:
- name: tool-logs
path: /calrissian/logs
s3:
key: '{{workflow.name}}-{{workflow.uid}}-artifacts/tool-logs'
archive:
none: {}
```

- `calrissian-output` is the Calrissian stdout
- `calrissian-stderr` is the Calrissian job stderr
- `calrissian-report` is the Calrissian usage report

See the example provided in folder `example`

## Caveats

### Additional volumes in the Argo Workflows WorkflowTemplate that runs the CWL

Let's say one wants to add a configmap on the Argo Workflows WorkflowTemplate that runs the CWL.

By design, this volume must also be declared in an Argo Workflows Workflow that wants to run the WorkflowTemplate in a step.

This means that if the Argo Workflows WorkflowTemplate that runs the CWL declares:

```yaml
volumes:
- name: cwl-wrapper-config-vol
configMap:
name: cwl-wrapper-config
items:
- key: main.yaml
- key: rules.yaml
- key: stage-in.cwl
- key: stage-out.cwl
```

The

```python
config_map_volume(
name="cwl-wrapper-config-vol",
configMapName="cwl-wrapper-config",
items=[{"key": "main.yaml"}, {"key": "rules.yaml"}, {"key": "stage-in.yaml"}, {"key": "stage-out.yaml"}],
defaultMode=420,
optional=False
)
```
Loading

0 comments on commit e22acac

Please sign in to comment.