Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rearrange processes and modules, add SV workflow for Delly2 #8

Merged
merged 42 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
3d1f16d
Rename workflow to snv_annotations, absorb Funcotator
nwiltsie Jul 24, 2024
7b773ab
s/RepeatMasker-v3.0.1/RepeatMasker-3.0.1/
nwiltsie Jul 24, 2024
ad98a45
Use stablelift image from main
nwiltsie Jul 24, 2024
dae4905
Add original copy of extract-vcf-features-SV.R
nwiltsie Jul 24, 2024
5df8c97
Add --output-rds argument
nwiltsie Jul 24, 2024
4716ac5
Add workflow for SV
nwiltsie Jul 24, 2024
8bbcfe2
Refactor, support SV and SNV
nwiltsie Jul 24, 2024
0e499b3
Add stubs to all processes
nwiltsie Jul 24, 2024
1f18dfe
Bugfix, need leading params
nwiltsie Jul 24, 2024
83cd1e7
Bugfix, remove module/ from relative path
nwiltsie Jul 24, 2024
966788b
Remove redundant process
nwiltsie Jul 24, 2024
645f154
Bugfix, clean up an undefined stub variable
nwiltsie Jul 24, 2024
88ae361
Bugfix, clean up more undefined stub variables
nwiltsie Jul 24, 2024
04e7f26
Get rid of variables in utils module
nwiltsie Jul 24, 2024
dfcb703
Clean up variables in sv_workflow.nf
nwiltsie Jul 24, 2024
257eb74
Clean up variables in snv_workflow.nf
nwiltsie Jul 24, 2024
01d3df7
Clean up variables in snv_annotations.nf
nwiltsie Jul 24, 2024
3359f08
Replace colons with slashes
nwiltsie Jul 24, 2024
162f22e
Combine intermediate files
nwiltsie Jul 24, 2024
92a73a5
Rename NFTest case as SNV-specific
nwiltsie Jul 24, 2024
68c7085
Add SV-specific NFTest, bugfix for parameters
nwiltsie Jul 24, 2024
4ae07bf
Bundle rtracklayer into Docker
nwiltsie Jul 25, 2024
8b87584
Group arguments in Dockerfile
nwiltsie Jul 25, 2024
5a2745a
Small bugfixes
nwiltsie Jul 25, 2024
7a48523
Pre-copy folder to standard path
nwiltsie Jul 25, 2024
2c4953f
Remove quotes
nwiltsie Jul 25, 2024
f03a5af
Try a different mechanism to get library paths
nwiltsie Jul 26, 2024
98750e9
Use branch version of image
nwiltsie Jul 25, 2024
642aa56
Bugfixes, test cleanup for SV case
nwiltsie Jul 26, 2024
6df89f0
Add mermaid flow diagram
nwiltsie Jul 29, 2024
971a713
Add output at end of pipeline
nwiltsie Jul 29, 2024
9291ff0
Pull in latest changes to predict-liftover-stability.R
nwiltsie Jul 26, 2024
729a970
Bugfix, channel mis-match
nwiltsie Jul 26, 2024
5977e7e
Update CHANGELOG
nwiltsie Jul 29, 2024
8e10b3b
Fix lints
nwiltsie Jul 29, 2024
22ccbc2
Sort VCF after liftover in SV branch
nwiltsie Jul 29, 2024
4701b01
Reword 'Variant Caller' to 'Variant Type'
nwiltsie Aug 2, 2024
a5570f4
Remove unused R function
nwiltsie Aug 2, 2024
b12a428
s/run_sv_liftover/liftover_SV_StableLift/
nwiltsie Aug 2, 2024
eb8983a
s/run_intersect_gnomad/annotate_gnomAD_StableLift/
nwiltsie Aug 2, 2024
5f27a63
Add 'StableLift-${manifest.version}' to output_dir_base
nwiltsie Aug 6, 2024
c9fde8c
Use wildcards for aptitude package build versions
nwiltsie Aug 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 6 additions & 51 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Changelog
All notable changes to the pipeline-name pipeline.
All notable changes to the StableLift pipeline.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

Expand All @@ -8,58 +8,13 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
---

## [Unreleased]
### Added
- Add `sample_id` extraction from BAM
- Add template input YAMLs
- Add pipeline-Nexflow-config as submodule and redirect set_resources_allocation
- Add pipeline-Nextflow-module as submodule
- Additional out of memory exit code
- Pipeline release action
- Template for NFTest testing results in PR template
- Enable dependabot
- Add example PlantUML image to README
- Add workflow to build documentation
- Add workflows to run Nextflow configuration tests

### Changed
- Switch resource limit checks to external scripts
- Update links in on-prem Confluence to point to cloud-based Confluence
- Fix `CODEOWNERS` file
- Use `schema.check_path` for `workDir` validation
- Add `Discussions` and `Contributors` to the Table of Contents in `README.md`
- Update from DSL1 to DSL2
- Standardize config structure
- Restructure repo so main script is main.nf
- Reorganize contributors and metadata
- Reorganize PR template so description is at top
- Update automatic node detection to allow for F2 detection
- Update Issue Template
- Standardize input/output/parameter structure in README
- Avoid modification of input parameter `output_dir`
- Create default docker container registry parameter for tools
- Use `methods.setup_process_afterscript()` to capture log files

---

## [1.0.0] - YYYY-MM-DD
### Added
- For new features.
- Added item 1.

### Changed
- For changes in existing functionality.
- Changed item 1.

### Deprecated
- For soon-to-be removed features.
- Add workflow for SNV callers (Mutect2, HaplotypeCaller, Strelka2, Muse2, SomaticSniper)
- Add workflow for SV caller (Delly2)
- Add pipeline diagram

### Removed
- For now removed features.
- Removed item 1.

### Fixed
- For any bug fixes.
- Fixed item 1.
### Changed

### Security
- In case of vulnerabilities.
- Sort VCF after liftover in SV branch
51 changes: 46 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,19 +1,60 @@
ARG R_VERSION=4.3.1
ARG R_VERSION="4.3.1"

ARG LIBBZ2_VERSION="1.0.8-*"
ARG LIBCURL_VERSION="7.81.0-*"
ARG LIBLZMA_VERSION="5.2.5-*"
ARG LIBXML2_VERSION="2.9.13+dfsg-*"
ARG PYTHON_VERSION="3.10.6-*"
ARG ZLIB_VERSION="1:1.2.11.dfsg-*"
ARG RLIBDIR="/usr/local/stablelift-R"

FROM rocker/r-ver:${R_VERSION} AS build

ARG LIBBZ2_VERSION
ARG LIBCURL_VERSION
ARG LIBLZMA_VERSION
ARG LIBXML2_VERSION
ARG ZLIB_VERSION

# Install build-time dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libbz2-dev=${LIBBZ2_VERSION} \
libcurl4-openssl-dev=${LIBCURL_VERSION} \
liblzma-dev=${LIBLZMA_VERSION} \
libxml2-dev=${LIBXML2_VERSION} \
zlib1g-dev=${ZLIB_VERSION} \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ARG BIOC_VERSION="3.18"
ENV BIOC_VERSION=${BIOC_VERSION}

ARG RLIBDIR
ENV RENV_PATHS_CACHE ${RLIBDIR}/.cache

RUN mkdir -p ${RENV_PATHS_CACHE}

WORKDIR ${RLIBDIR}

COPY docker/install-stablelift.R /tmp
RUN Rscript /tmp/install-stablelift.R

# renv prints to stdout, so we need to change directories
WORKDIR /
RUN echo ".libPaths( c( .libPaths(), \"/usr/local/stablelift-R/renv/library/R-4.3/$(Rscript -e "cat(unname(unlist(R.version['platform'])))")\" ) )" >> /usr/local/lib/R/etc/Rprofile.site

FROM rocker/r-ver:${R_VERSION}

# Overwrite the site library with just the desired packages. By default rocker
# only bundles docopt and littler in that directory.
COPY --from=build /tmp/userlib /usr/local/lib/R/site-library
ARG RLIBDIR
COPY --from=build ${RLIBDIR} ${RLIBDIR}
COPY --from=build \
/usr/local/lib/R/etc/Rprofile.site \
/usr/local/lib/R/etc/Rprofile.site

# Install python (required for argparse). The version is not important, but
# let's pin it for stability.
ARG PYTHON_VERSION=3.10.6-1~22.04
ARG PYTHON_VERSION

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
Expand Down
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@ If you are using the UCLA Azure cluster, please use the [submission script](http

## Flow Diagram

A directed acyclic graph of your pipeline. The [PlantUML](https://plantuml.com/) code defining this diagram is version-controlled in the [docs/](./docs/) folder, and a [GitHub Action](https://github.com/uclahs-cds/tool-PlantUML-action) automatically regenerates the SVG image when that file is changed.

![Pipeline Graph](./docs/pipeline-flow.svg)
![Pipeline Graph](./docs/pipeline.mmd.svg)

---

Expand Down
2 changes: 1 addition & 1 deletion config/default.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ params {
gatk_version = '4.2.4.1'
pipeval_version = '5.0.0-rc.3'
samtools_version = '1.20'
stablelift_version = 'branch-nwiltsie-bootstrap' // FIXME
stablelift_version = 'branch-nwiltsie-regroup-modules' // FIXME

docker_image_bcftools = "${-> params.docker_container_registry}/bcftools-score:${params.bcftools_version}"
docker_image_bedtools = "${-> params.docker_container_registry}/bedtools:${params.bedtools_version}"
Expand Down
2 changes: 1 addition & 1 deletion config/methods.config
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ methods {
set_output_dir = {
def date = new Date().format("yyyyMMdd'T'HHmmss'Z'", TimeZone.getTimeZone('UTC'))

params.output_dir_base = "${params.output_dir}/${manifest.name}-${manifest.version}/${params.sample_id.replace(' ', '_')}"
params.output_dir_base = "${params.output_dir}/${manifest.name}-${manifest.version}/${params.sample_id.replace(' ', '_')}/StableLift-${manifest.version}"
params.log_output_dir = "${params.output_dir_base}/log-${manifest.name}-${manifest.version}-${date}"
}

Expand Down
11 changes: 11 additions & 0 deletions config/schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,17 @@ sample_id:
type: 'String'
required: true
help: 'sample id supplied from input yaml'
variant_caller:
type: 'String'
required: true
help: 'Tool used to call structural or somatic variants'
choices:
- Mutect2
- HaplotypeCaller
- Strelka2
- Muse2
- SomaticSniper
- Delly2
yashpatel6 marked this conversation as resolved.
Show resolved Hide resolved
save_intermediate_files:
type: 'Bool'
required: true
Expand Down
9 changes: 8 additions & 1 deletion config/template.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ params {
chain_file = "/hot/ref/tool-specific-input/liftOver/hg19ToHg38.over.chain"

// FIXME How to describe this file?
repeat_bed = "/hot/ref/database/RepeatMasker-v3.0.1/processed/GRCh38/GRCh38_RepeatMasker_intervals.bed"
repeat_bed = "/hot/ref/database/RepeatMasker-3.0.1/processed/GRCh38/GRCh38_RepeatMasker_intervals.bed"

// SV files
// FIXME Should this be bundled?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification: Is this a question of whether the files should be bundled into the Docker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either into the Docker image, with the pipeline, or to a given reference path on disk. Put another way, is a user expected to (1) provide this file for each pipeline run, (2) have a standard copy locally, or (3) have it automatically provided for them by the pipeline?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we can divide the various input files into the 3 categories you listed:

(1) RF models (6 tools x 2 conversion directions = 12 total @ ~10Mb - 1Gb) hosted separately for user to download
(2) Expect user to have standard resource files such as reference fastas, chain files, funcotator sources
(3) Bundle the non-standard resource files (repeat_bed, header_contigs, gnomad_rds) into the Docker

Is this what you had in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - so I think that works out as:

  1. We upload RF models as attachments on pipeline releases.
  2. Users handle standard resource files.
  3. We bundle the non-standard resource files with the pipeline (the repeat_bed file is used outside of the docker image). That means they get checked into this repository and version-controlled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the non-standard resource files, it may be better to include as release attachments rather than version-control them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashpatel6 so you assert that there should be two categories of files?

  1. User-provided standard files
  2. Everything else, distributed as release attachments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I would suggest yes; the concern I would have about bundling the non-standard files into the Docker is the case where a user may want to make changes or provide a different file for those and having it bundled and then the user providing the paths in the config like other resources seems more consistent and allowing of that behavior

header_contigs = "/hot/code/nkwang/GitHub/uclahs-cds/project-method-AlgorithmEvaluation-BNCH-000142-GRCh37v38/report/manuscript/publish/GRCh38-vcf-header-contigs.txt"

// FIXME Should this be bundled?
gnomad_rds = "/hot/code/nkwang/GitHub/uclahs-cds/project-method-AlgorithmEvaluation-BNCH-000142-GRCh37v38/report/manuscript/publish/data/gnomad.v4.0.sv.Rds"
}

// Setup the pipeline config. DO NOT REMOVE THIS LINE!
Expand Down
40 changes: 18 additions & 22 deletions docker/install-stablelift.R
Original file line number Diff line number Diff line change
@@ -1,26 +1,22 @@
# Install the remotes package to the library
install.packages('remotes', lib = .Library)
install.packages('renv', lib = .Library)

# Make a temporary directory to hold all of the installed packages
localdir <- '/tmp/userlib'
dir.create(localdir)
options(
renv.settings.bioconductor.version = Sys.getenv('BIOC_VERSION')
)

dependencies <- c(
'ROCR' = '1.0-11',
'argparse' = '2.2.2',
'caret' = '6.0-94',
'data.table' = '1.14.8',
'doParallel' = '1.0.17',
'foreach' = '1.5.2',
'ranger' = '0.15.1',
'vcfR' = '1.14.0'
renv::init(
bare = TRUE,
bioconductor = Sys.getenv('BIOC_VERSION')
)

# Unfortunately, this will install the dependencies multiple times
for (name in names(dependencies)) {
remotes::install_version(
name,
unname(dependencies[name]),
lib = localdir
)
}
renv::install(c(
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'[email protected]',
'bioc::[email protected]'
))
79 changes: 79 additions & 0 deletions docs/pipeline.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
%%{init: {"flowchart": {"htmlLabels": false}} }%%

flowchart TD

classDef input fill:#ffffb3
classDef output fill:#b3de69
classDef gatk fill:#bebada
classDef bcftools fill:#fdb462
classDef R fill:#8dd3c7
classDef linux fill:#fb8072

subgraph legend ["`**Legend**`"]
direction RL
subgraph nodes ["`**Nodes**`"]
input[["Input File"]]:::input
input_node(["Parameterized Input"]):::input
output[["Output file"]]:::output
end

subgraph processes ["`**Processes**`"]
gatk_docker[GATK]:::gatk
bcftools_docker[bcftools]:::bcftools
r_docker[Rscript]:::R
linux_docker[Generic Linux]:::linux
end
end

legend
~~~ input_vcf[["Input VCF"]]:::input
--> pipeval:::linux
--> sv_vs_snv{{Variant Type?}}

sv_vs_snv ------> r_liftover
header_contigs .-> r_liftover
chain_file2 ..-> r_liftover
gnomad_rds .-> r_extract_sv

subgraph SV ["`**SV**`"]
%% Other input files
header_contigs([header_contigs]):::input
chain_file2([chain_file]):::input
gnomad_rds([gnomad_rds]):::input

r_liftover[liftover-Delly2-vcf.R]:::R
---> r_extract_sv[extract-VCF-features-SV.R]:::R

end

chain_file .-> bcftools_liftover
sv_vs_snv --> bcftools_liftover

subgraph SNV ["`**SNV**`"]
funcotator_sources([funcotator_sources]):::input
chain_file([chain_file]):::input
repeat_bed([repeat_bed]):::input

bcftools_liftover[bcftools +liftover]:::bcftools
---> gatk_func[gatk Funcotator]:::gatk
--> bcftools_annotate["`bcftools annotate*RepeatMasker*`"]:::bcftools
--> bcftools_annotate2["`bcftools annotate*Trinucleotide*`"]:::bcftools
--> r_extract_snv[extract-VCF-features.R]:::R
end

funcotator_sources .-> gatk_func
repeat_bed .-> bcftools_annotate

joinpaths{ }
r_extract_snv --> joinpaths
r_extract_sv --> joinpaths
joinpaths ---> r_predict_stability

subgraph Predict Stability ["`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Predict Stability**`"]
r_predict_stability[predict-liftover-stability.R]:::R
--> bcftools_annotate3["`bcftools annotate*Stability*`"]:::bcftools

rf_model([rf_model]):::input .-> r_predict_stability
end

bcftools_annotate3 --> output_vcfs[["Output VCFs"]]:::output
1 change: 1 addition & 0 deletions docs/pipeline.mmd.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading