Skip to content

Commit

Permalink
Partially fill-in README
Browse files Browse the repository at this point in the history
  • Loading branch information
nwiltsie committed Jul 16, 2024
1 parent 5d5b371 commit f2dc7fb
Showing 1 changed file with 79 additions and 41 deletions.
120 changes: 79 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Pipeline Name
# StableLift

- [Pipeline Name](#pipeline-name)
- [Overview](#overview)
Expand All @@ -17,7 +17,9 @@
- [References](#references)
- [Discussions](#discussions)
- [Contributors](#contributors)
- [License](#license)
- [License](#license)


## Overview

A 3-4 sentence summary of the pipeline, including the pipeline's purpose, the type of expected scientific inputs/outputs of the pipeline (e.g: FASTQs and BAMs), and a list of tools/steps in the pipeline.
Expand All @@ -26,11 +28,12 @@ A 3-4 sentence summary of the pipeline, including the pipeline's purpose, the ty

## How To Run

1. Update the params section of the .config file

2. Update the input yaml
1. Copy [`./config/template.config`](./config/template.config) (e.g. `project.config`) and fill in all required parameters.
2. For each input sample:
1. Copy [`./input/template.yaml`](./input/template.yaml) (e.g. `sample-002.yaml`) and update with the sample ID and VCF path.
2. Start the sample-specific pipeline run with `nextflow run -c project.config -params-file sample-002.yaml main.nf`

3. See the submission script, [here](https://github.com/uclahs-cds/tool-submit-nf), to submit your pipeline
If you are using the UCLA Azure cluster, please use the [submission script](https://github.com/uclahs-cds/tool-submit-nf) to submit your pipeline rather than calling `nextflow` directly.

---

Expand All @@ -46,56 +49,93 @@ A directed acyclic graph of your pipeline. The [PlantUML](https://plantuml.com/)

### 1. Step/Process 1

> A 2-3 sentence description of each step/proccess in your pipeline that includes the purpose of the step/process, the tool(s) being used and their version, and the expected scientific inputs/outputs (e.g: FASTQs and BAMs) of the pipeline.
> A 2-3 sentence description of each step/process in your pipeline that includes the purpose of the step/process, the tool(s) being used and their version, and the expected scientific inputs/outputs (e.g: FASTQs and BAMs) of the pipeline.
### 2. Step/Process 2

> A 2-3 sentence description of each step/proccess in your pipeline that includes the purpose of the step/process, the tool(s) being used and their version, and the expected scientific inputs/outputs (e.g: FASTQs and BAMs) of the pipeline.
> A 2-3 sentence description of each step/process in your pipeline that includes the purpose of the step/process, the tool(s) being used and their version, and the expected scientific inputs/outputs (e.g: FASTQs and BAMs) of the pipeline.
### 3. Step/Process n

> A 2-3 sentence description of each step/proccess in your pipeline that includes the purpose of the step/process, the tool(s) being used and their version, and the expected scientific inputs/outputs (e.g: FASTQs and BAMs) of the pipeline.
> A 2-3 sentence description of each step/process in your pipeline that includes the purpose of the step/process, the tool(s) being used and their version, and the expected scientific inputs/outputs (e.g: FASTQs and BAMs) of the pipeline.
---

## Inputs

UCLA pipelines have a hierarchical configuration structure to reduce code repetition:

* `config/default.config`: Parameters with sensible defaults that may be overridden in `myconfig.config`.
* `config/template.config -> myconfig.config`: Required sample-agnostic parameters. Often shared for many samples.
* `input/template.yaml -> mysample.yaml`: Required sample-specific parameters.

### Input YAML

> include an example of the organization structure within the YAML. Example:
```yaml
input 1: 'patient_id'
---
sample_id: "" # Identifying string for the input sample
input:
normal:
- id: <normal id>
BAM: </path/to/normal.bam>
tumor:
- id: <tumor id>
BAM: </path/to/tumor.bam>
vcf: "" # Path to the sample's VCF file
```
### Config
| Field | Type | Required | Description |
| ----- | ---- | ------------ | ------------------------ |
| param 1 | _type_ | yes/no | 1-2 sentence description of the parameter, including any defaults if any. |
| param 2 | _type_ | yes/no | 1-2 sentence description of the parameter, including any defaults if any. |
| param n | _type_ | yes/no | 1-2 sentence description of the parameter, including any defaults if any. |
| `work_dir` | path | no | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |

> Include the optional param `work_dir` in the inputs accompanied by a warning of the potentials dangers of using the param. Update the warning if necessary.
### Input Configuration
| Required Parameter | Type | Description |
| ----------------------------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------- |
| `output_dir` | path | Absolute path to the directory where the output files are to be saved. |
| `variant_caller` | string | ??? |
| `rf_model` | path | ??? |
| `funcotator_data.data_source` | path | ??? |
| `funcotator_data.src_reference_id` | string | ??? |
| `funcotator_data.dest_reference_id` | string | ??? |
| `src_fasta_ref` | path | Absolute path to the source reference sequence in FASTA format. Must correspond with `functotator_data.src_reference_id`. |
| `dest_fasta_ref` | path | Absolute path to the destination reference sequence in FASTA format. Must correspond with `functotator_data.dest_reference_id`. |
| `chain_file` | path | LiftOver chain file between the source and destination sequences. |
| `repeat_bed` | path | ??? |


| Optional Parameter | Type | Default | Description |
| --------------------------- | ----------------------------------------------------------------------------------------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `work_dir` | path | `/scratch/$SLURM_JOB_ID` | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |
| `save_intermediate_files` | boolean | false | If set, save output files from intermediate pipeline processes. |
| `min_cpus` | int | 1 | Minimum number of CPUs that can be assigned to each process. |
| `max_cpus` | int | `SysHelper.getAvailCpus()` | Maximum number of CPUs that can be assigned to each process. |
| `min_memory` | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `1.MB` | Minimum amount of memory that can be assigned to each process. |
| `max_memory` | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `SysHelper.getAvailMemory()` | Maximum amount of memory that can be assigned to each process. |
| `dataset_id` | string | `""` | ??? |
| `blcds_registered_dataset` | boolean | false | Set to true when using BLCDS folder structure; use false for now. |
| `ucla_cds` | boolean | true | If set, overwrite default memory and CPU values by UCLA cluster-specific configs. |
| `src_fasta_fai` | path | Relative to `src_fasta_ref` | Index for source reference sequence. |
| `src_fasta_dict` | path | Relative to `src_fasta_ref` | Dictionary for source reference sequence. |
| `dest_fasta_fai` | path | Relative to `dest_fasta_ref` | Index for destination reference sequence. |
| `dest_fasta_dict` | path | Relative to `src_fasta_ref` | Dictionary for destination reference sequence. |
| `docker_container_registry` | string | `ghcr.io/uclahs-cds` | Container registry for the docker images in the following table. |

The docker images in the following table are generally defined like `docker_image_pipeval = "${-> params.docker_container_registry}/pipeval:${params.pipeval_version}"`. As such, there are three ways to modify each image:

* Change `params.docker_container_registry`. This will affect all of the images (except for GATK).
* Change `params.<tool>_version`. This will pull a different version of the same image from the registry.
* Change `params.docker_image_<tool>`. This will explicitly set the image to use, ignoring `docker_container_registry` and `<tool>_version`, and thus requires that the docker tag be explicitly set (e.g. `broadinstitute/gatk:4.2.4.1`).

| Tool Parameter | Version Parameter | Default | Notes |
| ------------------------ | -------------------- | ------------------------------------------------------------ | ------------------------------------------------------------------- |
| `docker_image_bcftools` | `bcftools_version` | `ghcr.io/uclahs-cds/bcftools-score:1.20_score-1.20-20240505` | This image must have both BCFtools and the score plugins available. |
| `docker_image_bedtools` | `bedtools_version` | `ghcr.io/uclahs-cds/bedtools:2.31.0` | |
| `docker_image_gatk` | `gatk_version` | `broadinstitute/gatk:4.2.4.1` | |
| `docker_image_pipeval` | `pipeval_version` | `ghcr.io/uclahs-cds/pipeval:5.0.0-rc.3` | |
| `docker_image_samtools` | `samtools_version` | `ghcr.io/uclahs-cds/samtools:1.20` | |
| `doker_image_stablelift` | `stablelift_version` | `ghcr.io/uclahs-cds/stablelift:FIXME` | This image is built and maintained via this repository. |

---

## Outputs

<!-- List and describe the final output(s) of the pipeline. -->

| Output | Description |
| ------------ | ------------------------ |
| ouput 1 | 1 - 2 sentence description of the output. |
| ouput 2 | 1 - 2 sentence description of the output. |
| ouput n | 1 - 2 sentence description of the output. |
| `*_stability.vcf.gz` | ??? |
| `*_stability.vcf.gz.tbi` | ??? |
| `*_filtered.vcf.gz` | ??? |
| `*_filtered.vcf.gz.tbi` | ??? |

---

Expand All @@ -107,8 +147,8 @@ A 2-3 sentence description of the test data set(s) used to validate and test thi

### Validation <version number\>

Input/Output | Description | Result
| ------------ | ------------------------ | ------------------------ |
| Input/Output | Description | Result |
| ------------ | ------------------------ | ------------------------ |
| metric 1 | 1 - 2 sentence description of the metric | quantifiable result |
| metric 2 | 1 - 2 sentence description of the metric | quantifiable result |
| metric n | 1 - 2 sentence description of the metric | quantifiable result |
Expand All @@ -133,23 +173,21 @@ Included is a template for validating your input files. For more information on

## Discussions

- [Issue tracker](<link-to-repo-issues-page>) to report errors and enhancement ideas.
- Discussions can take place in [<pipeline> Discussions](<link-to-repo-discussions-page>)
- [<pipeline> pull requests](<link-to-repo-pull-requests>) are also open for discussion
- [Issue tracker](https://github.com/uclahs-cds/pipeline-StableLift/issues) to report errors and enhancement ideas.
- Discussions can take place in [pipeline-StableLift Discussions](https://github.com/uclahs-cds/pipeline-StableLift/discussions)
- [pipeline-StableLift pull requests](https://github.com/uclahs-cds/pipeline-StableLift/pulls) are also open for discussion

---

## Contributors

> Update link to repo-specific URL for GitHub Insights Contributors page.

Please see list of [Contributors](https://github.com/uclahs-cds/template-NextflowPipeline/graphs/contributors) at GitHub.
Please see list of [Contributors](https://github.com/uclahs-cds/pipeline-StableLift/graphs/contributors) at GitHub.

---

## License

[pipeline name] is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.
pipeline-StableLift is licensed under the GNU General Public License version 2. See the file LICENSE for the terms of the GNU GPL license.

<one line to give the program's name and a brief idea of what it does.>

Expand Down

0 comments on commit f2dc7fb

Please sign in to comment.