Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FE-349 data clean up for hca ingest manifest.py #340

Merged
merged 44 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
2de5281
update poetry install action to fix issue with poetry install and set…
bahill Nov 12, 2024
cc14986
Adding data sanitizing - strip white space from rows and make institu…
bahill Nov 12, 2024
90dd634
adding location of scala image for transparency
bahill Nov 12, 2024
b21d631
Merge 90dd63424b77c429bbb6cd0c1d4bcb06d5d0ebaa into a3f357257c259c6fc…
bahill Nov 12, 2024
966f757
Update requirements.txt
Nov 12, 2024
ad90fa0
updating lock file...
bahill Nov 12, 2024
0877a5c
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 12, 2024
71b51b2
Merge 0877a5c80435123328c151a4f6c794390121a114 into a3f357257c259c6fc…
bahill Nov 12, 2024
7d2849c
Update requirements.txt
Nov 12, 2024
91184bd
matching poetry versions between user code env and GH actions
bahill Nov 12, 2024
7a9d1ce
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 12, 2024
32eadee
Merge 7a9d1ceb430102afa07b16494ba057f7986a9f7b into a3f357257c259c6fc…
bahill Nov 12, 2024
ef37b7b
Update requirements.txt
Nov 12, 2024
65cffaa
attempt to clear cache #1
bahill Nov 14, 2024
61ed79b
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 14, 2024
2b71145
Merge 61ed79bbbfa0749699d63f3b067c4d4f6b860572 into a3f357257c259c6fc…
bahill Nov 14, 2024
72ae5b3
Update requirements.txt
Nov 14, 2024
2552fc7
attempt to clear cache #2
bahill Nov 14, 2024
ba4e903
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 14, 2024
941de4d
Merge ba4e9039de6f7d5481d7cfa574fe6fcfbd5a7584 into a3f357257c259c6fc…
bahill Nov 14, 2024
a6343e4
Update requirements.txt
Nov 14, 2024
0416e6c
attempt to clear cache #3
bahill Nov 14, 2024
d68cdd3
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 14, 2024
3738780
Merge d68cdd3297165892e9b636b1bfd4f2237a18e105 into a3f357257c259c6fc…
bahill Nov 14, 2024
0d6d091
Update requirements.txt
Nov 14, 2024
0c181f8
attempt to clear cache #4
bahill Nov 14, 2024
11119b6
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 14, 2024
3c7d52c
Merge 11119b6eaaa0b8fbc4d5f38e0952582d1e8205ce into a3f357257c259c6fc…
bahill Nov 14, 2024
ea1364e
Update requirements.txt
Nov 14, 2024
1b1593f
attempt to clear cache #5 and moving out action noise for now
bahill Nov 14, 2024
25b3c4f
Merge remote-tracking branch 'origin/FE-349-data-clean-up-for-hca-ing…
bahill Nov 14, 2024
b4e4cf5
attempt to clear cache #5 - typo fix
bahill Nov 14, 2024
86b23e0
attempt to clear cache #5 - are there no caches?
bahill Nov 14, 2024
b2e28a4
attempt to clear cache #6
bahill Nov 14, 2024
61f8578
attempt to clear cache #6 - part 2
bahill Nov 14, 2024
fce8610
attempt to clear cache #6 - part 3
bahill Nov 14, 2024
45e84de
attempt to clear cache #6
bahill Nov 14, 2024
29394ee
attempt to clear cache #7
bahill Nov 14, 2024
5469493
attempt to clear cache #8
bahill Nov 14, 2024
14fd107
NEVERMIND let's update poetry
bahill Nov 14, 2024
a64eead
NEVERMIND let's update poetry
bahill Nov 14, 2024
e9de7ed
putting my workflows back
bahill Nov 14, 2024
334afbb
Merge e9de7ed640fb648b43174fe320012b175c021561 into a3f357257c259c6fc…
bahill Nov 14, 2024
a437966
Update requirements.txt
Nov 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build_and_publish_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
java-version: [email protected]
- name: Push Scala Dataflow Docker image
run: sbt publish
# us.gcr.io/broad-dsp-gcr-public/hca-transformation-pipeline
- name: Get artifact slug
id: get-artifact-slug
run: 'echo ::set-output name=slug::$(git rev-parse --short "$GITHUB_SHA")'
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/build_and_publish_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ jobs:
run: gcloud auth configure-docker --quiet us.gcr.io,us-east4-docker.pkg.dev
- name: Push Scala Dataflow Docker image
run: sbt publish
# us.gcr.io/broad-dsp-gcr-public/hca-transformation-pipeline
- name: Get artifact slug
id: get-artifact-slug
run: 'echo ::set-output name=slug::$(git rev-parse --short "$GITHUB_SHA")'
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/generate-requirements-file.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
with:
python-version: 3.9.16
- name: Install Poetry
uses: snok/install-poetry@v1.2
uses: snok/install-poetry@v1
with:
version: 1.1.9
virtualenvs-create: true
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/validate_pull_request_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@ jobs:
with:
python-version: 3.9.16
- name: Install Poetry
uses: snok/install-poetry@v1.2
uses: snok/install-poetry@v1
with:
version: 1.1.9
version: 1.8.0
- name: Restore cache dependencies
uses: actions/cache@v2
env:
cache-name: cache-poetry-v2
with:
path: ~/.cache/pypoetry
# key uses pyproject.toml hash, so it's unique to each version of pyproject.toml
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('./orchestration/pyproject.toml') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/validate_python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
with:
python-version: 3.9.16
- name: Install Poetry
uses: snok/install-poetry@v1.2
uses: snok/install-poetry@v1
with:
version: 1.1.9
- name: Cache dependencies
Expand Down
2 changes: 1 addition & 1 deletion orchestration/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ ENV PYTHONFAULTHANDLER=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.1.8 \
POETRY_VERSION=1.1.9 \
SENTRY_DSN=https://[email protected]/4506559533088768

RUN pip install "poetry==$POETRY_VERSION"
Expand Down
6 changes: 3 additions & 3 deletions orchestration/hca_manage/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
"dev": {
"EBI": "gs://broad-dsp-monster-hca-dev-ebi-staging/dev",
"UCSC": "gs://broad-dsp-monster-hca-dev-ebi-staging/dev",
"TEST": "gs://broad-dsp-monster-hca-prod-ebi-storage/broad_test_dataset"
}
}
ENV_PIPELINE_ENDINGS = {
Expand Down Expand Up @@ -101,15 +102,15 @@ def _parse_csv(csv_path: str, env: str, project_id_only: bool = False,
continue

assert len(row) == 2
institution = row[0]
row = [x.strip() for x in row]
institution = row[0].upper()
project_id = find_project_id_in_str(row[1])

key = None
if project_id_only:
project_id = row[1]
key = project_id
else:
# TODO check for all caps - change to all caps if not, then match
if institution not in STAGING_AREA_BUCKETS[env]:
raise Exception(f"Unknown institution {institution} found")

Expand Down Expand Up @@ -178,7 +179,6 @@ def _enumerate_manifests(env: str) -> None:


def load(args: argparse.Namespace) -> None:
parse_and_load_manifest(args.env, args.csv_path, args.release_tag, "load_hca")
parse_and_load_manifest(args.env, args.csv_path, args.release_tag, "per_project_load_hca")
parse_and_load_manifest(args.env, args.csv_path, args.release_tag, "validate_ingress")
parse_and_load_manifest(
Expand Down
7 changes: 4 additions & 3 deletions orchestration/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions orchestration/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cffi = "1.16.0"
# TODO: we'll probably want to use just the dagster version here and not the API versions as well
# https://github.com/dagster-io/dagster/blob/master/MIGRATION.md#migrating-to-10
dagster = "0.12.14"
dagster-gcp = "^0.12.14"
dagster-gcp = "0.12.14"
dagster-k8s = "0.12.14"
dagster-postgres = "0.12.14"
dagster-slack = "0.12.14"
Expand Down Expand Up @@ -58,7 +58,7 @@ soft_delete = "hca_manage.soft_delete:run"
job = "hca_manage.job:fetch_job_info"

[build-system]
requires = ["poetry-core=^1.1.8"]
requires = ["poetry-core<=1.1.9"]
build-backend = "poetry.core.masonry.api"

[tool.autopep8]
Expand Down
2 changes: 1 addition & 1 deletion orchestration/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dagster==0.12.14
data-repo-client==1.542.0
docstring-parser==0.15; python_version >= "3.9" and python_version < "3.10"
frozenlist==1.4.0; python_version >= "3.9" and python_version < "3.10" and python_full_version >= "3.6.0"
google-api-core==2.19.0; python_version >= "3.9" and python_version < "3.10" and (python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10") and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
google-api-core==2.23.0; python_version >= "3.9" and python_version < "3.10" and (python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10") and (python_version >= "3.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0" and python_version >= "3.7")
google-api-python-client==1.12.11; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
google-auth-httplib2==0.1.1; python_version >= "2.7" and python_full_version < "3.0.0" or python_full_version >= "3.4.0"
google-auth==2.23.3; python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10"
Expand Down
Loading