Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Minimal CLI #1

Closed
wants to merge 229 commits into from
Closed
Show file tree
Hide file tree
Changes from 205 commits
Commits
Show all changes
229 commits
Select commit Hold shift + click to select a range
10134a0
First.
sharkinsspatial Apr 19, 2021
e9a2e79
Include pre-commit hooks.
sharkinsspatial Apr 19, 2021
7bf8fe6
Include tests.
sharkinsspatial Apr 20, 2021
4ec7901
Include tox configuration for tests.
sharkinsspatial Apr 20, 2021
e7324dd
Add README.
sharkinsspatial Apr 20, 2021
ea91ff2
README updates.
sharkinsspatial Apr 20, 2021
42c12f9
Make target storage_options optional.
sharkinsspatial Apr 20, 2021
6e5f4d8
Fix meta dataclass to align with staged-recipes.
sharkinsspatial Apr 21, 2021
0d631fd
Explicit credentials are required for storage targets.
sharkinsspatial Apr 22, 2021
f0988c0
Assume private target is always used.
sharkinsspatial Apr 22, 2021
f7be5f1
Provide list of all target secrets to flow_manager.
sharkinsspatial Apr 22, 2021
20728f7
Add credentials for flow storage at the bakery level.
sharkinsspatial Apr 23, 2021
b940764
Run tests on push.
sharkinsspatial Apr 23, 2021
12259bb
Raise errors for currently unsupported configurations.
sharkinsspatial Apr 24, 2021
a18a18f
Remove hardcoded Prefect project name.
sharkinsspatial Apr 24, 2021
29b3158
Remove legacy bucket secret retrieval approach.
sharkinsspatial Apr 24, 2021
1313767
Add docstring for register_flow.
sharkinsspatial Apr 24, 2021
ce22844
Fix typo in os.environ.
sharkinsspatial Apr 24, 2021
a1c4862
Remove README code block typo.
sharkinsspatial Apr 24, 2021
d5fc27c
Updates to handle latest refactor of pangeo_forge.
sharkinsspatial Apr 28, 2021
d1bb451
Include pangeo_forge logs in Prefect Cloud log re-direction.
sharkinsspatial Apr 28, 2021
d7ac168
Remove comments from test recipe.
sharkinsspatial Apr 28, 2021
ec43b2b
Include hardcoded number of workers for testing.
sharkinsspatial Apr 28, 2021
116145f
Merge pull request #1 from pangeo-forge/update_dependencies
sharkinsspatial Apr 28, 2021
41a3866
Implement adaptive scaling for managing Dask cluster workers.
sharkinsspatial Apr 28, 2021
74197ff
Merge pull request #2 from pangeo-forge/adaptive_scaling
sharkinsspatial Apr 28, 2021
7380e73
Remove stray worker count setting.
sharkinsspatial Apr 28, 2021
8d78c6b
Merge pull request #3 from pangeo-forge/fix/worker_num
sharkinsspatial Apr 28, 2021
9da64e4
Update dependencies to use release of pangeo-forge-recipes.
sharkinsspatial May 12, 2021
0e050a2
Merge pull request #4 from pangeo-forge/dependency_pinning
sharkinsspatial May 12, 2021
8177be2
Include version checking for registration and bakery worker environment.
sharkinsspatial May 13, 2021
77abbcb
Merge pull request #5 from pangeo-forge/version_checking
sharkinsspatial May 13, 2021
ff17cfe
Fix typo in versions argument.
sharkinsspatial May 13, 2021
c1e6368
Merge pull request #6 from pangeo-forge/version_checking
sharkinsspatial May 13, 2021
93d41e6
Update meta.yaml to support recipe import changes from ADR 0002.
sharkinsspatial May 15, 2021
7bef2d2
Remove debug comment.
sharkinsspatial May 15, 2021
63ebce5
Merge pull request #7 from pangeo-forge/update_module_find
sharkinsspatial May 15, 2021
6f5edb0
Correctly configure targets for recipes in from_dict.
sharkinsspatial May 21, 2021
2bd2f9b
Merge pull request #8 from pangeo-forge/fix/flow_naming
sharkinsspatial May 21, 2021
06906f6
Use target storage scheme specified in ADR 0003.
sharkinsspatial May 26, 2021
43ec795
Merge pull request #11 from pangeo-forge/storage_key_config
sharkinsspatial May 27, 2021
27a92fa
Begin to refactor bakery.py and flow_manager.py to accommodate azure.aks
ciaransweet Jun 25, 2021
05a1355
Refactor configure_run_config for aks
ciaransweet Jun 25, 2021
2ef7ba6
Refactor configure_targets for azure
ciaransweet Jun 25, 2021
2994442
Refactor configure_dask_executor for aks
ciaransweet Jun 25, 2021
2ea1f4a
Add secrets to dask_executor and add PR template
ciaransweet Jun 25, 2021
32d3d5b
Set K8s worker units to be in Mi for memory and m for cpu
ciaransweet Jun 29, 2021
9694fc2
Add default values for k8s dask executor
ciaransweet Jun 29, 2021
c15d662
Merge pull request #12 from pangeo-forge/add-azure-bakery
sharkinsspatial Jul 8, 2021
39e7ce8
Increase minimum resource limits when using Fargate cluster type.
sharkinsspatial Jul 8, 2021
098acee
Merge pull request #13 from pangeo-forge/resource_updates
sharkinsspatial Jul 8, 2021
b5e688d
Move test secrets structure to a fixture.
sharkinsspatial Jul 9, 2021
dd09c39
Update tests to verify pangeo-forge-recipes refactoring api change.
sharkinsspatial Jul 16, 2021
4dc3a82
Update bakery meta and tests to align with bakeries-database.
sharkinsspatial Jul 18, 2021
6d4efeb
Remove commented file patterns.
sharkinsspatial Jul 18, 2021
83fb6be
Merge pull request #15 from pangeo-forge/recipes_upgrade
sharkinsspatial Jul 18, 2021
ec75b06
Include support for recipe dimension pruning before registration.
sharkinsspatial Jul 19, 2021
ac290bb
Merge pull request #16 from pangeo-forge/recipes_upgrade
sharkinsspatial Jul 19, 2021
585c971
Use correct environment source of GITHUB_REPOSITORY value.
sharkinsspatial Jul 20, 2021
e2b575d
Merge pull request #17 from pangeo-forge/target_path
sharkinsspatial Jul 20, 2021
020e612
Create prefect notification hook when registering test flows.
sharkinsspatial Jul 22, 2021
92dded1
Only create Prefect hook when it does not already exist.
sharkinsspatial Jul 23, 2021
3965b9a
Merge pull request #18 from pangeo-forge/prefect_hooks
sharkinsspatial Jul 23, 2021
c0c3e01
Use flow_run_name rather than id for comment link.
sharkinsspatial Jul 23, 2021
867d692
Merge pull request #19 from pangeo-forge/prefect_hooks
sharkinsspatial Jul 23, 2021
3f78cb8
Use new MetadataTarget class from pangeo-forge/pangeo-forge-recipes/p…
sharkinsspatial Aug 12, 2021
56324e1
Merge pull request #20 from pangeo-forge/metadata_target
sharkinsspatial Aug 12, 2021
7098191
Remove and re-add automation hook to support feedstock notifications.
sharkinsspatial Aug 16, 2021
615151b
Merge pull request #22 from pangeo-forge/replace_automations
sharkinsspatial Aug 16, 2021
bcda6c5
Use MALLOC_TRIM_THRESHOLD_ setting to reduce unmanaged memory usage.
sharkinsspatial Aug 16, 2021
3640058
Merge pull request #23 from pangeo-forge/unmanaged_memory
sharkinsspatial Aug 16, 2021
8842e4f
Add task timeout for fsspec hanging issue.
sharkinsspatial Sep 14, 2021
9ad68ee
Merge pull request #26 from pangeo-forge/task_timeout
sharkinsspatial Sep 14, 2021
671bf9f
Revert "Add task timeout for fsspec hanging issue."
sharkinsspatial Sep 14, 2021
39866c8
Merge pull request #27 from pangeo-forge/revert-26-task_timeout
sharkinsspatial Sep 14, 2021
c0b3bb5
Expand size of dask scheduler for AKS clusters.
sharkinsspatial Sep 17, 2021
3b39b32
Merge pull request #28 from pangeo-forge/scheduler_pod_spec
sharkinsspatial Sep 17, 2021
0704595
Revert "Expand size of dask scheduler for AKS clusters."
sharkinsspatial Sep 17, 2021
1e18298
Merge pull request #29 from pangeo-forge/revert-28-scheduler_pod_spec
sharkinsspatial Sep 17, 2021
fbd6a83
Include clean_pod_template for dask_kubernetes.
sharkinsspatial Sep 17, 2021
7eb3d54
Remove errant flow timeout.
sharkinsspatial Sep 17, 2021
45764a3
Merge branch 'master' into scheduler_pod_spec
sharkinsspatial Sep 17, 2021
0d8d086
Merge pull request #30 from pangeo-forge/scheduler_pod_spec
sharkinsspatial Sep 17, 2021
a62ad94
Increase job pod memory request.
sharkinsspatial Sep 17, 2021
87249a0
Merge pull request #31 from pangeo-forge/increase_job_pod_memory
sharkinsspatial Sep 17, 2021
4462bdf
Set prefect task retries.
sharkinsspatial Sep 17, 2021
a82d172
Merge pull request #32 from pangeo-forge/task_retries
sharkinsspatial Sep 17, 2021
47955d6
Include retry_delay.
sharkinsspatial Sep 17, 2021
a637779
Merge pull request #33 from pangeo-forge/task_retries
sharkinsspatial Sep 17, 2021
d6ce749
poetry init
cisaacstern Oct 4, 2021
35f40cd
ls command first draft
cisaacstern Oct 5, 2021
7775a1c
move catalog module to top level
cisaacstern Oct 5, 2021
c7e439e
add path to built-logs table
cisaacstern Oct 5, 2021
56ac98a
refactor bakery.py to use common utils.py
cisaacstern Oct 5, 2021
8afa0e3
add FeedstockMetadata dataclass & BakeryMetadata.filter_logs
cisaacstern Oct 5, 2021
0979c79
refactor cli/bakery.py
cisaacstern Oct 5, 2021
a686d5f
add BakeryMetadata.get_mapper method
cisaacstern Oct 5, 2021
742e599
catalog/generate.py opens ds from run_id passed to cli/catalog.py
cisaacstern Oct 5, 2021
0089538
frankenstein STAC Item created based on xstac example template
cisaacstern Oct 6, 2021
ec668cb
bbox from xarray dataset
cisaacstern Oct 6, 2021
caadae0
restructure top level commands
cisaacstern Oct 6, 2021
2f5f011
move item_template to json, more updates to generate.py
cisaacstern Oct 7, 2021
cc2ea2a
factor bbox and timebound creation out of generate func
cisaacstern Oct 7, 2021
78b6fbe
restructure/refactor
cisaacstern Oct 7, 2021
ca12c2d
linting
cisaacstern Oct 7, 2021
eab44a2
refactor path logic
cisaacstern Oct 7, 2021
1b26af9
make all deps installable from poetry
cisaacstern Oct 7, 2021
1b3027e
correct https path in metadata.py
cisaacstern Oct 8, 2021
337fd6a
add pathlib absolute paths to catalog.py
cisaacstern Oct 8, 2021
4d9051a
minimal locally working version of papermill
cisaacstern Oct 8, 2021
61b9a2a
add ExecuteNotebook dataclass
cisaacstern Oct 8, 2021
fe4a243
catalog updates
cisaacstern Oct 8, 2021
c8b401e
template updates
cisaacstern Oct 8, 2021
270391c
handle item output in separate function
cisaacstern Oct 8, 2021
b29d64f
add credentialed_fs to BakeryMetadata
cisaacstern Oct 8, 2021
fde1904
programmatically write STAC Items to S3
cisaacstern Oct 8, 2021
a109d21
big update for nbviewer links in STAC Item
cisaacstern Oct 8, 2021
872c01d
wow, reproducible nbviewer links reading stac from s3; binder links next
cisaacstern Oct 9, 2021
5b6b008
add tests dir + pytest to deps
cisaacstern Oct 12, 2021
9acd5cf
add bakery cli tests
cisaacstern Oct 12, 2021
308dc5a
refactor test funcs into shared module
cisaacstern Oct 12, 2021
c438c40
add recipe cli test
cisaacstern Oct 12, 2021
d7a4950
add feedstock test (placeholder)
cisaacstern Oct 12, 2021
1b121ad
add test bakery http server
cisaacstern Oct 13, 2021
8a05e81
refactor BakeryMetadata to remove hardcoding in favor of extra yaml arg
cisaacstern Oct 13, 2021
bdc385a
add and test mock bakery yaml
cisaacstern Oct 13, 2021
c8a202d
add architecture.md placeholder
cisaacstern Oct 13, 2021
f518588
add first draft catalog test
cisaacstern Oct 13, 2021
d0a91eb
add test bakery with --extra-bakery-yaml
cisaacstern Oct 13, 2021
bdda1e8
test build logs
cisaacstern Oct 13, 2021
3689fe4
test mock-feedstock --view build-logs
cisaacstern Oct 13, 2021
3fa1fd6
fix local import in feedstock test
cisaacstern Oct 13, 2021
9d28da9
correct import in recipe test module
cisaacstern Oct 13, 2021
073e081
first draft metadata test
cisaacstern Oct 13, 2021
c4fd499
beginning of pydantic refactor
cisaacstern Oct 14, 2021
b48c53e
prep meta types only for orchestrator merge
cisaacstern Oct 14, 2021
de36c30
Merge remote-tracking branch 'prefect-merge/meta-types-only' into min…
cisaacstern Oct 14, 2021
201c70e
rename metadata.py -> components.py
cisaacstern Oct 14, 2021
a8c6eef
big rewrite of Bakery component (formerly BakeryMetadata class) to wo…
cisaacstern Oct 15, 2021
0f28ca5
first test_meta_types commit
cisaacstern Oct 15, 2021
caffafe
BakeryMeta as pydantic.dataclass and associated tests
cisaacstern Oct 15, 2021
85894ec
getting closer on meta_types tests; just need to tweak StorageOptions
cisaacstern Oct 15, 2021
a955924
test key invalidation of StorageOptions
cisaacstern Oct 16, 2021
e1590a7
test invalid StorageOptions values
cisaacstern Oct 18, 2021
8aeecb0
StorageOptions as BaseModel, w/ extra=forbid and constr types
cisaacstern Oct 26, 2021
c70afca
restructure test dir + test StorageOptions
cisaacstern Oct 26, 2021
c92f50f
add placeholder dirs
cisaacstern Oct 26, 2021
d8d778d
remove unused imports
cisaacstern Oct 26, 2021
5d9aeab
add BakeryName model, move BakeryDatabase into meta_types.bakery
cisaacstern Oct 26, 2021
295a0ae
validate BakeryName in Bakery component
cisaacstern Oct 26, 2021
939a44d
add RunRecord and BuildLogs meta types
cisaacstern Oct 27, 2021
87a546c
make Bakery.build_logs attr a BuildLogs instance
cisaacstern Oct 27, 2021
cca7933
more bakery component testing
cisaacstern Oct 27, 2021
8381d08
implement and test POSTing to local http server
cisaacstern Oct 28, 2021
7fa25f4
credentialed POSTs to test server
cisaacstern Oct 28, 2021
a31e5f0
add test_catalog.py
cisaacstern Oct 28, 2021
4aee117
make write_test_file DRY
cisaacstern Oct 28, 2021
11004b6
test catalog generate (basic)
cisaacstern Oct 28, 2021
86bd82c
test catalog (to_file)
cisaacstern Oct 28, 2021
48e6e4b
oops, remove test stac item
cisaacstern Oct 28, 2021
6f70e79
clear template notebook outputs
cisaacstern Oct 28, 2021
bb125f3
bakery component edits including default http headers
cisaacstern Oct 29, 2021
df235d3
haha! should probably write files on POST
cisaacstern Oct 29, 2021
4168b16
test stac item + notebook execution to_file
cisaacstern Oct 29, 2021
731d9ab
conditional os.remove calls in test_catalog
cisaacstern Oct 29, 2021
fd22f4a
stricter type checking for FeedstockMetadata object
cisaacstern Oct 29, 2021
d45ebcf
update + test bakery cli
cisaacstern Oct 29, 2021
cda5823
add bakery database validator
cisaacstern Oct 29, 2021
8eceefe
add bakery database validator + tests
cisaacstern Oct 30, 2021
c48d6b5
add TODO comment for bakery cli
cisaacstern Oct 30, 2021
d402546
update feedstock cli placeholder
cisaacstern Oct 30, 2021
39fc27d
update recipe cli placeholder test
cisaacstern Oct 30, 2021
4f96722
wow. all. tests. pass.
cisaacstern Oct 30, 2021
e5ccf53
lint
cisaacstern Oct 30, 2021
7ff972e
add pre-commit
cisaacstern Oct 31, 2021
2700d47
workflows first draft
cisaacstern Oct 31, 2021
6137fc8
Merge branch 'add-workflows' into minimal-cli
cisaacstern Oct 31, 2021
1fe483d
conda install poetry
cisaacstern Oct 31, 2021
bc1b181
install ipykernel for papermill notebook execution
cisaacstern Oct 31, 2021
1292bb4
poetry add jupyterlab
cisaacstern Oct 31, 2021
05dcf4f
fix path issue in test_catalog
cisaacstern Oct 31, 2021
46b2008
pre-commit run --all-files
cisaacstern Oct 31, 2021
0284af5
add codecov ci
cisaacstern Oct 31, 2021
e4d953b
poetry add codecov
cisaacstern Oct 31, 2021
6209892
poetry add pytest-cov
cisaacstern Oct 31, 2021
d7de6e1
add docs
cisaacstern Nov 2, 2021
0638a26
arbitrary change to see if Read the Docs works
cisaacstern Nov 2, 2021
b71c8b1
readthedocs can't find pangeo-forge-recipes 0.6.0 via pip; try from g…
cisaacstern Nov 2, 2021
7ffb2bc
oops wrong branch for pangeo-forge-recipes install
cisaacstern Nov 2, 2021
c25e5eb
readthedocs: spec python version in build.tools
cisaacstern Nov 2, 2021
f476c24
maybe no build.os?
cisaacstern Nov 2, 2021
cb14444
maybe build comes first
cisaacstern Nov 2, 2021
6269cf7
try legacy .readthedocs.yml?
cisaacstern Nov 2, 2021
d638a0a
go back to pangeo-forge-recipes readthedocs config
cisaacstern Nov 2, 2021
228484b
maybe just downgrade python req for now
cisaacstern Nov 2, 2021
2646d1b
docs tweaks
cisaacstern Nov 2, 2021
7040f85
oooh readthedocs config wasn't in root
cisaacstern Nov 2, 2021
cf0576c
switch packaging to match pangeo-forge-recipes
cisaacstern Nov 5, 2021
43c86f2
delete notebook module + tests
cisaacstern Nov 5, 2021
5e59706
update workflows for new packaging style
cisaacstern Nov 5, 2021
b86182c
Update pangeo_forge_orchestrator/cli/bakery.py
cisaacstern Nov 5, 2021
701a77f
Update pangeo_forge_orchestrator/cli/bakery.py
cisaacstern Nov 5, 2021
07d6393
remove notebook module dependencies
cisaacstern Nov 5, 2021
5212885
move helper funcs out of Bakery class
cisaacstern Nov 5, 2021
e9c8100
Merge remote-tracking branch 'origin/minimal-cli' into minimal-cli
cisaacstern Nov 5, 2021
242769a
add mypy pydantic plugin
cisaacstern Nov 5, 2021
04a8eb6
add _version.py to gitignore
cisaacstern Nov 5, 2021
0e149d3
add pydantic dep for mypy hook
cisaacstern Nov 5, 2021
f110c7e
refactor BakeryDatabase type + tests
cisaacstern Nov 8, 2021
0585fc4
refactor components
cisaacstern Nov 8, 2021
b6d238f
update bakery database validation
cisaacstern Nov 8, 2021
2978d52
update Bakery kwargs in catalog module
cisaacstern Nov 8, 2021
c437a92
comment out pytest timout config
cisaacstern Nov 8, 2021
2402893
update bakery cli + tests
cisaacstern Nov 8, 2021
c645918
remove extraneous prints in tests
cisaacstern Nov 8, 2021
68f534b
freeze all dataclasses
cisaacstern Nov 8, 2021
bc82f04
run_id as int, store build_logs as csv
cisaacstern Nov 8, 2021
2529779
use MetaDotYaml type in components.FeedstockMetadata
cisaacstern Nov 9, 2021
67ef8d4
rename components -> interfaces
cisaacstern Nov 9, 2021
08f377d
working on bakery interface methods
cisaacstern Nov 10, 2021
102de6d
working towards catalog api generalization
cisaacstern Nov 10, 2021
7cb9467
cf_xarray first pass
cisaacstern Nov 11, 2021
2f63644
fix bakery database naming in tests
cisaacstern Nov 11, 2021
69b4a6b
CLI: optionally load bakery database from envvar
cisaacstern Nov 11, 2021
5da8378
build_logs as property; addl methods work
cisaacstern Nov 11, 2021
3078845
support unmerged feedstock metadata
cisaacstern Nov 11, 2021
fbf4cc1
some stac edge case handling
cisaacstern Nov 13, 2021
5138d58
add cf_xarray req
cisaacstern Nov 13, 2021
0987701
fix catalog to match latest xstac PR
cisaacstern Nov 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit =
*/tests/*
72 changes: 72 additions & 0 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Compare: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/.github/workflows/main.yaml

name: Tests

on:
push:
branches: "*"
paths-ignore:
- 'docs/**'
pull_request:
branches: master
paths-ignore:
- 'docs/**'

env:
PYTEST_ADDOPTS: "--color=yes"

jobs:
test:
name: ${{ matrix.python-version }}-build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [3.8, 3.9]
steps:
- uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
architecture: x64
# - name: Cache conda
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm dead config -- is this CL still WIP? Should I come back later?

# uses: actions/cache@v1
# env:
# Increase this value to reset cache if ci/py${{ matrix.python-version }}.yml has not changed
# CACHE_NUMBER: 0
# with:
# path: ~/conda_pkgs_dir
# key: ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles('ci/py${{ matrix.python-version }}.yml') }}
- name: setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: pfo
# environment-file: ci/py${{ matrix.python-version }}.yml
python-version: ${{ matrix.python-version }}
auto-activate-base: false
# use-only-tar-bz2: true
- name: install pangeo-forge-orchestrator plus deps
shell: bash -l {0}
run: |
pip install -e '.[dev]'
- name: print conda env
shell: bash -l {0}
run: |
conda info
conda list
- name: Run Tests
shell: bash -l {0}
run: |
pytest tests -v --cov=pangeo_forge_orchestrator \
--cov-config .coveragerc \
--cov-report term-missing \
--cov-report xml \
--durations=10 --durations-min=1.0
- name: Codecov
uses: codecov/[email protected]
with:
file: ./coverage.xml
env_vars: OS,PYTHON
name: codecov-umbrella
fail_ci_if_error: false
14 changes: 14 additions & 0 deletions .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: pre-commit

on:
pull_request:
push:
branches: [main]

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: pre-commit/[email protected]
49 changes: 49 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# See: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/.pre-commit-config.yaml

repos:

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.1.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-docstring-first
- id: check-json
- id: check-yaml
- id: pretty-format-json
args: ["--autofix", "--indent=2", "--no-sort-keys"]

- repo: https://github.com/ambv/black
rev: 19.10b0
hooks:
- id: black
args: ["--line-length", "100"]

- repo: https://gitlab.com/pycqa/flake8
rev: 3.8.3
hooks:
- id: flake8
args:
- "--max-line-length=100"

- repo: https://github.com/asottile/seed-isort-config
rev: v2.2.0
hooks:
- id: seed-isort-config

- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v0.910'
hooks:
- id: mypy
exclude: tests

- repo: https://github.com/pycqa/isort
rev: 5.5.4
hooks:
- id: isort
args: ["--profile", "black"]

- repo: https://github.com/myint/rstcheck
rev: 3f92957478422df87bd730abde66f089cc1ee19b
hooks:
- id: rstcheck
10 changes: 10 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: 2

sphinx:
configuration: docs/conf.py

# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.8
install:
- requirements: docs/requirements.txt
Empty file added ARCHITECTURE.md
Empty file.
20 changes: 20 additions & 0 deletions codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
codecov:
require_ci_to_pass: no
max_report_age: off

comment: false

coverage:
precision: 2
round: down
status:
project:
default:
target: 95
informational: true
patch: off
changes: off

ignore:
- "tests/*"
- "**/__init__.py"
22 changes: 22 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Compare: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/docs/Makefile

# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Binary file added docs/_static/pangeo-forge-logo-blue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 47 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Compare: https://github.com/pangeo-forge/pangeo-forge-recipes/blob/master/docs/conf.py

# -- Project information -----------------------------------------------------

project = "Pangeo Forge Orchestrator"
copyright = "2021, Pangeo Community"
author = "Pangeo Community"

# -- General configuration ---------------------------------------------------

extensions = [
"myst_nb",
"sphinx.ext.autodoc",
"sphinx.ext.extlinks",
# "numpydoc",
"sphinx_autodoc_typehints",
"sphinx_copybutton",
]

extlinks = {
"issue": ("https://github.com/pangeo-forge/pangeo-forge-orchestrator/issues/%s", "GH issue "),
"pull": ("https://github.com/pangeo-forge/pangeo-forge-orchestrator/pull/%s", "GH PR "),
}

exclude_patterns = ["_build", "**.ipynb_checkpoints"]
master_doc = "index"

# we always have to manually run the notebooks because they are slow / expensive
# jupyter_execute_notebooks = "auto"
# execution_excludepatterns = ["tutorials/xarray_zarr/*", "tutorials/hdf_reference/*"]

# -- Options for HTML output -------------------------------------------------

# https://sphinx-book-theme.readthedocs.io/en/latest/configure.html
html_theme = "pangeo_sphinx_book_theme"
html_theme_options = {
"repository_url": "https://github.com/pangeo-forge/pangeo-forge-orchestrator",
"repository_branch": "main",
"path_to_docs": "docs",
"use_repository_button": True,
"use_issues_button": True,
"use_edit_page_button": True,
}
html_logo = "_static/pangeo-forge-logo-blue.png"
html_static_path = ["_static"]

myst_heading_anchors = 2
19 changes: 19 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Pangeo Forge Orchestrator

```{warning}
The official Pangeo Forge docs can be found at [https://pangeo-forge.readthedocs.io/](https://pangeo-forge.readthedocs.io/en/latest/).

You have found the documentation for for `pangeo-forge-orchestrator`, an unreleased package.
```

```{toctree}
:maxdepth: 1

motivation
quick_start
structural_view
new_pydantic_types
use_guide
testing_strategy
next_steps
```
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
13 changes: 13 additions & 0 deletions docs/motivation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Motivation

The first practical application of this package (even before we refactor other automation repos into it) is to orchestrate the process of generating STAC Items for datasets which have already been built with Pangeo Forge. All of the metadata required for cataloging already exists _somewhere_ in Pangeo Forge:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a link for the words STAC Items, to help the uninitiated reader understand what you mean.


- the datasets themselves are in a Bakery target somewhere
- other metadata is available in the Feedstocks's `meta.yaml`

...so a generalizable cataloging approach needs to know:

- how to access a given Bakery target and open a dataset therein
- which datasets reside at which paths at the given Bakery target
- which Feedstocks (including Feedstock [version](https://github.com/pangeo-forge/roadmap/pull/34)) those paths were built from
- how to read/parse a `meta.yaml` from a given versioned Feedstock
Empty file added docs/new_pydantic_types
Empty file.
66 changes: 66 additions & 0 deletions docs/new_pydantic_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# New pydantic types

In updating `meta_types/bakery.py` (as mentioned in {doc}`structural_view`), `pangeo-forge-orchestrator` aims to implement Pydantic-based input validation with the lightest touch possible for each input type. The types were initially implemented as Python dataclasses, so for many of them, the edit was simply to use `pydantic.dataclasses` as a drop-in replacement (perhaps in combination with stricter type hints). In addition, three new dataclasses, two new Models, and some regex-constrained type functions are defined; these are described below.

## `BakeryName`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have BakeryName but not Bakery?


This dataclass validates a string against the [Bakery naming scheme defined in the Bakery database ADR](https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0004-use-yaml-file-for-bakery-database.md#bakery-name). As I understand it, the stipulation to "follow [java package syntax](https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html) to ensure unique bakery names" means that these names should all begin with a reversed organization url. Therefore, I'm making the assumption (reflected in this dataclass's type hinting) that un-reversing this portion of the Bakery name should yield a valid `pydantic.HttpUrl`. (More on this in {doc}`use_guide`.) In addition, following the ADR, an acceptable Bakery name input string must conclude with `".bakery.{region}"` where region conforms to an `"{provider}.{region}"` format.

```{eval-rst}
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.BakeryName
:members:
```

## `RunRecord`

> This type (along with the next one, `BuildLogs`) provide the necessary link, mentioned in {doc}`motivation`, between datasets in a Bakery target and the Feedstocks from which they were built. They are new concepts for Pangeo Forge and certainly merit their own ADR. After we move through this PR (assuming we agree that some version of these concepts are useful), we can work on an ADR for them.

This is a container for metadata describing the execution of a Pangeo Forge Recipe, including:
- **timestamp**: The datetime at which the execution took place (not sure if this should be start, end, or maybe tuple of both).
- **feedstock**: The name of the feedstock (including version). For this PR, I'm provisionally using the format `"{feedstock_name}@{major_version}.{minor_version}"`.
- **recipe**: The name of the Python object within the Feedstock's `recipe.py` module which was used to build the dataset. In the case of a single-recipe module, this is the name of a `pangeo_forge_recipes.recipes` class instance (and therefore needs to be a valid Python identifier). In the case of a [`dict_object`](https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0002-use-meta-yaml-to-track-feedstock-metadata.md#recipes-section), this would follow the established convention for the `meta.yaml`: i.e., `"{dict_name}:{dict_key}"`.
- **path**: The relative path to the dataset within the Bakery storage target.

```{eval-rst}
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.RunRecord
:members:
```


## `BuildLogs`

This is a mapping between an execution run identifier and a `RunRecord`. Provisionally, I'm specifying a run identifier as a (five digit) integer string, i.e. `"00000"`, `"00001"` etc. These values would be sequentially assigned to each dataset as they are built to a given Bakery storage target. (And each Bakery target would keep its own tally.) There are other identifiers (i.e. the dataset path) which will be unique within a given Bakery storage target, but the idea here is to provide a short string for passing as a command line argument (an example of this is provided in the {doc}`use_guide`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use either an integer or a uuid. Forget about the 5-digit string business.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 bc82f04 (Docs updates to follow)


> Currently, the `pangeo_forge_orchestrator.components.Bakery` object assumes that a `build_logs.json` (with entries parsable into `BuildLogs` objects) will exist at the Bakery target's root path. By opening and parsing this JSON, the `Bakery` instance knows exactly what dataset paths exist at the target and what Feedstocks they are tied to. In the future, these records could be ingested into a database instead of, or in addition to, keeping a copy in the Bakery target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appending build log records to a .json file is not going to be sustainable for long. json is just not an appendable format! Given the simplicity of the build log, a simple CSV would suffice better.

In the long run, we really need to think about the right architecture here... Should the bakeries have their own REST endpoint? Sqlite database? Log to a central service?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simple CSV would suffice better

👍 bc82f04

In the long run, we really need to think about the right architecture here

Definitely -> #4


```{eval-rst}
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.BuildLogs
:members:
```

## Why is `StorageOptions` a Model?

The Bakery database ADR [defines a `storage_options` field](https://github.com/pangeo-forge/roadmap/blob/master/doc/adr/0004-use-yaml-file-for-bakery-database.md#storage-options) for Bakery target access parameters. The reason we're defining the Python container for these options as a pydantic Model, rather than a dataclass, is for the [`.dict(exclude_none=True)` method](https://pydantic-docs.helpmanual.io/usage/exporting_models/#modeldict), which allows us to define arbitrary numbers of optional type-checked fields for this object, but also succinctly export kwargs dictionaries representing only those fields which have been set on a given instance.

```{eval-rst}
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.StorageOptions
:members:
```


## ... and what about `BakeryDatabase`?

The new `BakeryDatabase` object is a Model (instead of a dataclass) because we want to take advantage of Model features for `pangeo_forge_orchestrator.components.Bakery`, and it seemed to make sense to have `Bakery` inherit from `BakeryDatabase`.

```{eval-rst}
.. autoclass:: pangeo_forge_orchestrator.meta_types.bakery.BakeryDatabase
:members:
```

## What's with those regexes?

There are certain string values (such as the Feedstock name, etc.) which we definitely want to ensure conform to a specified format, but for which it seemed excessive to define an entire class for. For [these cases](https://github.com/pangeo-forge/pangeo-forge-orchestrator/blob/620989215c8d191d55c3080d403d6454a895230b/pangeo_forge_orchestrator/meta_types/bakery.py#L110-L121), I opted to use pydantic's [Constrained Type function, `constr`](https://pydantic-docs.helpmanual.io/usage/types/#constrained-types). I (and I think most people) don't find regular expressions especially readable, so I wrote explanatory comments for each of these cases.

## Full diff

And finally, here's [the full diff](https://github.com/pangeo-forge/pangeo-forge-orchestrator/compare/de36c30070f249136a5eb3c0f54144f3eaafb428..620989215c8d191d55c3080d403d6454a895230b#diff-374b3112607d6019e80fa96dff3aec0f9159e803faf62a96ef35330f308bff9b) between Sean's existing types, and those proposed in this PR.
6 changes: 6 additions & 0 deletions docs/next_steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Next steps

- Circle back to Orchestrator and BuildLogs ADR proposals
- Bring the official Bakery database within spec, as described in the {doc}`use_guide`
- Update the `catalog` CLI (and related pydantic types) to make it possible to build STAC Items for datasets built from un-merged Feedstocks
- Begin to merge automation repositories (i.e. unmerged portions of `pangeo-forge-prefect`) and refactor related GitHub Actions accordingly
10 changes: 10 additions & 0 deletions docs/quick_start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Quick start

`pangeo-forge-orchestrator` uses Poetry for dependencies/packaging, because this is [recommended by Typer](https://typer.tiangolo.com/tutorial/package/). For a local install, you can follow [the steps defined by the CI tests](https://github.com/pangeo-forge/pangeo-forge-orchestrator/blob/620989215c8d191d55c3080d403d6454a895230b/.github/workflows/main.yaml#L41-L57), namely:

1. Create & activate a new conda env named `pfo-poetry` with python >= 3.8
> "pfo" for "pangeo-forge-orchestrator"; the environment name doesn't matter if you don't plan to execute example notebooks with Papermill
2. From the repo root run: `conda install -c conda-forge poetry && poetry install`
3. (Optionally) Run `python -m ipykernel install --user --name pfo-poetry` if you want to execute example notebooks with Papermill

The CLI entrypoint is `pangeo-forge`, so you can then start exploring with `pangeo-forge --help`. There are a limited number of commands that actually work now, they are detailed in {doc}`use_guide`.
Loading