Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerization of workflow #112

Open
wants to merge 61 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
5380c6b
style: formatting with isort & black
florianzwagemaker May 3, 2024
836daa3
build: add base container definitions
florianzwagemaker May 3, 2024
347317b
build(dependencies): Add a separate conda recipe and container for su…
florianzwagemaker May 3, 2024
f455445
feat: add container support to main workflow
florianzwagemaker May 3, 2024
c29495f
feat: add containerization support to workflow
florianzwagemaker May 3, 2024
d4e1fc9
feat: add primary container management functions to viroconstrictor w…
florianzwagemaker May 3, 2024
41bd864
ci: add github actions workflow for automatic building of containers
florianzwagemaker May 3, 2024
aa5aec6
fix: change the pathcompleter to a list comprehension instead of usin…
florianzwagemaker May 3, 2024
e67bb27
feat: add new global userprofile section for reproducibility method (…
florianzwagemaker May 3, 2024
7a23f92
refactor: capture all custom snakemake configuration settings in a th…
florianzwagemaker May 3, 2024
46ff906
refactor: pass all the snakemake configuration settings through the s…
florianzwagemaker May 3, 2024
bd9ac91
refactor: add a singularity container activation log message to the l…
florianzwagemaker May 3, 2024
e7907b1
refactor: pass all the snakemake configuration settings through the s…
florianzwagemaker May 3, 2024
3a7d02b
ci: add step in container build/test workflow to download already exi…
florianzwagemaker May 6, 2024
28e3333
ci: temporarily add listing of directory contents for debugging purposes
florianzwagemaker May 6, 2024
ac8a3ac
ci: change upstream registry
florianzwagemaker May 6, 2024
7216e7b
Merge pull request #95 from RIVM-bioinformatics/main
florianzwagemaker May 6, 2024
9341f95
ci: Update GitHub Actions workflows for container publishing
florianzwagemaker May 6, 2024
0992600
fix: add an __init__.py file to the viroconstrictor.workflow dir
florianzwagemaker May 6, 2024
2e4f183
fix: change the fastqc wrapper script to its own dedicated dir
florianzwagemaker May 6, 2024
66ec41c
refactor: add verbose option to `download_containers` function
florianzwagemaker May 6, 2024
b5fed9f
ci: add listing of pip modules as a test. force the usage of the micr…
florianzwagemaker May 6, 2024
ea17e19
refactor: Improve container download error handling and logging
florianzwagemaker May 6, 2024
7679e8e
Merge pull request #97 from RIVM-bioinformatics/containers
florianzwagemaker May 6, 2024
cd2189b
ci: change the unpack statement in download artifact step of upload G…
florianzwagemaker May 6, 2024
9392de9
ci: remove listing of directory contents
florianzwagemaker May 6, 2024
79a914d
Merge pull request #98 from RIVM-bioinformatics/containers
florianzwagemaker May 6, 2024
4907ed3
ci: update github actions workflow with correct triggers
florianzwagemaker May 6, 2024
b2a165d
fix: add checks to not perform container-specific tasks if conda is u…
florianzwagemaker May 6, 2024
f4410bf
fix: properly handle the floating point comparison to check the PRESE…
florianzwagemaker May 10, 2024
1c329dc
style: formatting with isort & black
florianzwagemaker May 10, 2024
8a5fdfe
docs: add docstrings to all functions in containers.py
florianzwagemaker May 10, 2024
a883cd0
refactor: remove a matching group in split paragraph argformatter fun…
florianzwagemaker Jun 19, 2024
9ce0263
fix: ensure proper spliced-section length filtering
florianzwagemaker Jun 19, 2024
2a3c3e9
refactor: change type into a Callable instead of Any
florianzwagemaker Jun 19, 2024
de05dca
refactor: merge nested if statement
florianzwagemaker Jun 19, 2024
0e277a5
refactor: slightly simplify custom log handler
florianzwagemaker Jun 19, 2024
1e73647
refactor: move preset parameters into their own json file instead of …
florianzwagemaker Jun 19, 2024
d276d02
fix: include wrappers dir and preset_parameters.json during package i…
florianzwagemaker Jun 19, 2024
f0b93d5
Merge pull request #100 from RIVM-bioinformatics/main
florianzwagemaker Jul 3, 2024
f16437c
refactor: Remove 'intel' channel from self-update repo_channels
florianzwagemaker Jul 3, 2024
527ad85
deps; Ensure conda strict channel compatibility with the Scripts.yaml…
florianzwagemaker Sep 17, 2024
bf87d9b
Merge pull request #103 from RIVM-bioinformatics/main
florianzwagemaker Sep 17, 2024
6435965
chore: update docstrings and comments in build_containers.py script t…
florianzwagemaker Sep 17, 2024
9e3cc0f
refactor: fix typing mismatch
florianzwagemaker Sep 17, 2024
5311b32
ci: add workflow_dispatch triggers to container building/publishing w…
florianzwagemaker Sep 17, 2024
5637908
chore: add quick script for local building (and moving) of containers
florianzwagemaker Sep 17, 2024
6553973
Merge pull request #106 from RIVM-bioinformatics/main
florianzwagemaker Sep 25, 2024
379ffc9
Merge pull request #108 from RIVM-bioinformatics/main
florianzwagemaker Oct 8, 2024
4f6e45d
fix: do not download containers in dryrun mode
florianzwagemaker Oct 21, 2024
54f6cf9
refactor: split out calculation of file hashes into separate function
florianzwagemaker Oct 21, 2024
f7c90e3
fix: create an empty {sample}_primers.bed file when no input primers …
florianzwagemaker Oct 28, 2024
941c19b
deps: limit mamba version to <2.0.0
florianzwagemaker Nov 18, 2024
5cf5fbd
fix: resolve snakemake AmbiguousRuleException for certain combination…
florianzwagemaker Nov 18, 2024
bb3b16f
fix: always regenerate incomplete files when a previous analysis was …
florianzwagemaker Nov 18, 2024
8180c56
fix: #111 - Do not change python working dir to requested output dire…
florianzwagemaker Nov 18, 2024
0c593b3
refactor: move reference filtering and preparation rules to dedicated…
florianzwagemaker Nov 21, 2024
5140811
deps: update pysam to version 0.21 in downstream env and pin package …
florianzwagemaker Nov 21, 2024
0dc3d58
deps: set flexible pyopenssl dependency to version 24.x.x
florianzwagemaker Dec 18, 2024
253a601
fix: ensure proper parsing and assigning of input data when handling …
florianzwagemaker Dec 18, 2024
df8b9ee
fix: add required dryrun option to container cache configuration for …
florianzwagemaker Dec 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
name: build containers and run tests

on:
pull_request:
branches:
- 'main'
workflow_dispatch:


jobs:
Setup_and_build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Setup Mamba
uses: mamba-org/setup-micromamba@v1
with:
cache-environment: true
post-cleanup: 'all'
environment-file: env.yml
init-shell: bash

- name: Install local python package
run: |
pip install . --no-deps
shell: micromamba-shell {0}

- name: build containers
run: |
python containers/build_containers.py
env:
TOKEN: ${{ secrets.GITHUB_TOKEN }}
shell: micromamba-shell {0}

- name: zip built containers
run: |
cd ./containers/
tar -czvf containers.tar.gz builtcontainers.json $(find . -type f -name "*.tar" -printf '%f ')


- name: Upload container artifacts
uses: actions/upload-artifact@v3
with:
name: built_containers
path: ./containers/containers.tar.gz

Test:
runs-on: ubuntu-latest
needs: Setup_and_build
steps:
- uses: actions/checkout@v3

- uses: actions/download-artifact@v3
with:
name: built_containers

- name: move artifact
run: |
mv ./containers.tar.gz ./containers/containers.tar.gz

- name: unzip built containers
run: |
cd ./containers/
tar -xzvf containers.tar.gz
cd ..

- name: Setup Apptainer
uses: eWaterCycle/setup-apptainer@v2

- name: Setup Mamba
uses: mamba-org/setup-micromamba@v1
with:
cache-environment: true
post-cleanup: 'all'
environment-file: env.yml
init-shell: bash

- name: Install local python package
run: |
pip install . --no-deps
shell: micromamba-shell {0}

- name: convert containers
run: |
python containers/convert_artifact_containers_for_apptainer.py

- name: download existing containers
run: |
python containers/pull_published_containers.py
shell: micromamba-shell {0}

## rest of the testing suite here
63 changes: 63 additions & 0 deletions .github/workflows/publish_containers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
name: Publish containers

on:
release:
types:
- published
workflow_dispatch:


jobs:
Upload:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Download artifact
id: download-artifact
uses: dawidd6/action-download-artifact@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
workflow: build_and_test.yml
name: built_containers
skip_unpack: false

- name: move artifact
run: |
mv ./containers.tar.gz ./containers/containers.tar.gz

- name: unzip built containers
run: |
cd ./containers/
tar -xzvf containers.tar.gz
cd ..

- name: Setup Mamba
uses: mamba-org/setup-micromamba@v1
with:
cache-environment: true
post-cleanup: 'all'
environment-file: env.yml
init-shell: bash

- name: Install local python package
run: |
pip install . --no-deps
shell: micromamba-shell {0}

- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Add artifacted containers to docker daemon
run: |
python containers/add_OCI_to_docker_engine.py
shell: micromamba-shell {0}

- name: tag and push containers
run: |
python containers/tag_and_push_containers.py
shell: micromamba-shell {0}
59 changes: 50 additions & 9 deletions ViroConstrictor/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
from ViroConstrictor.runconfigs import GetSnakemakeRunDetails, WriteYaml
from ViroConstrictor.runreport import WriteReport
from ViroConstrictor.update import update
from ViroConstrictor.workflow.containers import (
construct_container_bind_args,
download_containers,
)


def get_preset_warning_list(
Expand Down Expand Up @@ -67,7 +71,13 @@ def get_preset_warning_list(
This applies to the following samples:\n{''.join(samples)}"""
preset_score_warnings.append(warn)

p_fallbackwarning_df = sample_info_df.loc[sample_info_df["PRESET_SCORE"] == 0.0]
# check if the preset score is larger or equal than 0.0 and smaller than 0.000001 (1e-6)
# We do this because the preset score is a float and we want to check if it is within a certain range as floating point equality checks are not reliable
p_fallbackwarning_df = sample_info_df.loc[
(sample_info_df["PRESET_SCORE"] >= 0.0)
& (sample_info_df["PRESET_SCORE"] < 1e-6)
]

targets, presets = (
(
list(x)
Expand Down Expand Up @@ -151,7 +161,19 @@ def main() -> NoReturn:
inputs_obj=parsed_input, samplesheetfilename="samples_main"
)

# if configured to use containers, check if they are available and download them if necessary
# TODO: add the verbosity flag to the download_containers function and update log message to reflect this
if (
snakemake_run_details.snakemake_run_conf["use-singularity"]
and download_containers(snakemake_run_details.snakemake_run_conf) != 0
):
log.error(
"Failed to download containers required for workflow.\nPlease check the logs and your settings for more information and try again later."
)
sys.exit(1)

log.info(f"{'='*20} [bold yellow] Starting Main Workflow [/bold yellow] {'='*20}")

status: bool = False
if parsed_input.user_config["COMPUTING"]["compmode"] == "local":
status = snakemake.snakemake(
Expand All @@ -160,22 +182,29 @@ def main() -> NoReturn:
cores=snakemake_run_details.snakemake_run_conf["cores"],
use_conda=snakemake_run_details.snakemake_run_conf["use-conda"],
conda_frontend="mamba",
use_singularity=snakemake_run_details.snakemake_run_conf["use-singularity"],
singularity_args=construct_container_bind_args(parsed_input.samples_dict),
jobname=snakemake_run_details.snakemake_run_conf["jobname"],
latency_wait=snakemake_run_details.snakemake_run_conf["latency-wait"],
dryrun=snakemake_run_details.snakemake_run_conf["dryrun"],
force_incomplete=snakemake_run_details.snakemake_run_conf["force-incomplete"],
configfiles=[
WriteYaml(
snakemake_run_details.snakemake_run_parameters,
f"{parsed_input.workdir}/config/run_params.yaml",
)
),
WriteYaml(
snakemake_run_details.snakemake_run_conf,
f"{parsed_input.workdir}/config/run_configs.yaml",
),
],
restart_times=3,
keepgoing=True,
restart_times=snakemake_run_details.snakemake_run_conf["restart-times"],
keepgoing=snakemake_run_details.snakemake_run_conf["keep-going"],
quiet=["all"], # type: ignore
log_handler=[
ViroConstrictor.logging.snakemake_logger(logfile=parsed_input.logfile)
],
printshellcmds=False,
printshellcmds=snakemake_run_details.snakemake_run_conf["printshellcmds"],
)
if parsed_input.user_config["COMPUTING"]["compmode"] == "grid":
status = snakemake.snakemake(
Expand All @@ -185,23 +214,31 @@ def main() -> NoReturn:
nodes=snakemake_run_details.snakemake_run_conf["cores"],
use_conda=snakemake_run_details.snakemake_run_conf["use-conda"],
conda_frontend="mamba",
use_singularity=snakemake_run_details.snakemake_run_conf["use-singularity"],
singularity_args=construct_container_bind_args(parsed_input.samples_dict),
jobname=snakemake_run_details.snakemake_run_conf["jobname"],
latency_wait=snakemake_run_details.snakemake_run_conf["latency-wait"],
drmaa=snakemake_run_details.snakemake_run_conf["drmaa"],
drmaa_log_dir=snakemake_run_details.snakemake_run_conf["drmaa-log-dir"],
dryrun=snakemake_run_details.snakemake_run_conf["dryrun"],
force_incomplete=snakemake_run_details.snakemake_run_conf["force-incomplete"],
configfiles=[
WriteYaml(
snakemake_run_details.snakemake_run_parameters,
f"{parsed_input.workdir}/config/run_params.yaml",
)
),
WriteYaml(
snakemake_run_details.snakemake_run_conf,
f"{parsed_input.workdir}/config/run_configs.yaml",
),
],
restart_times=3,
keepgoing=True,
restart_times=snakemake_run_details.snakemake_run_conf["restart-times"],
keepgoing=snakemake_run_details.snakemake_run_conf["keep-going"],
quiet=["all"], # type: ignore
log_handler=[
ViroConstrictor.logging.snakemake_logger(logfile=parsed_input.logfile)
],
printshellcmds=snakemake_run_details.snakemake_run_conf["printshellcmds"],
)

if snakemake_run_details.snakemake_run_conf["dryrun"] is False and status is True:
Expand All @@ -213,7 +250,11 @@ def main() -> NoReturn:
WriteYaml(
snakemake_run_details.snakemake_run_parameters,
f"{parsed_input.workdir}/config/run_params.yaml",
)
),
WriteYaml(
snakemake_run_details.snakemake_run_conf,
f"{parsed_input.workdir}/config/run_configs.yaml",
),
],
quiet=["all"], # type: ignore
log_handler=[
Expand Down
5 changes: 3 additions & 2 deletions ViroConstrictor/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def _split_paragraphs(self, text: str) -> list[str]:
"""Split text in to paragraphs of like-indented lines."""

text = textwrap.dedent(text).strip()
text = re.sub("\n\n[\n]+", "\n\n", text)
text = re.sub("\n\n\n+", "\n\n", text)

last_sub_indent: Optional[int] = None
paragraphs: list[str] = []
Expand Down Expand Up @@ -157,7 +157,8 @@ def pathCompleter(self, text: str, state: int) -> str:
if os.path.isdir(text):
text += "/"

return list(glob.glob(f"{text}*"))[state]
# we explicitly to a list comprehension here instead of a call to the constructor as the this would otherwise break the autocompletion functionality of paths.
return [x for x in glob.glob(f"{text}*")][state]

def createListCompleter(self, ll: list[str]) -> None:
"""
Expand Down
13 changes: 5 additions & 8 deletions ViroConstrictor/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import pathlib
import re
from typing import Any
from typing import Any, Callable

from rich.color import ANSI_COLOR_NAMES
from rich.default_styles import DEFAULT_STYLES
Expand Down Expand Up @@ -219,8 +219,9 @@ def print_jobstatistics_logmessage(msg: dict) -> None:
log.info(f"Job statistics:\n[yellow]{logmessage}[/yellow]")


logmessage_strings_info: dict[str, Any] = {
logmessage_strings_info: dict[str, Callable] = {
"Activating conda environment": ColorizeLogMessagePath,
"Activating singularity image": ColorizeLogMessagePath,
"Building DAG of jobs": BaseLogMessage,
"Creating conda environment": ColorizeLogMessagePath,
"Removing incomplete Conda environment": ColorizeLogMessagePath,
Expand Down Expand Up @@ -262,15 +263,11 @@ def log_handler(msg: dict) -> None:
loglevel = msg.get("level")
logmessage = msg.get("msg")

if loglevel == "dag_debug":
return None
if loglevel == "debug":
return None
if loglevel == "shellcmd":
if loglevel in ["dag_debug", "debug", "shellcmd"]:
return None

if logmessage is not None and any(
x for x in logmessage_suppressed_strings_warning if x in logmessage
x in logmessage for x in logmessage_suppressed_strings_warning
):
return None

Expand Down
Loading
Loading