Skip to content

Commit

Permalink
update inputs and documentaton
Browse files Browse the repository at this point in the history
  • Loading branch information
nkwang24 committed Oct 23, 2024
1 parent d325b1f commit 4506e34
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 32 deletions.
28 changes: 12 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,35 +91,31 @@ input:
### Input Configuration
| Required Parameter | Type | Description |
| ----------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `output_dir` | path | Path to the directory where the output files are to be saved. |
| `variant_caller` | string | Variant calling algorithm used to generate input VCF {HaplotypeCaller, Mutect2, Strelka2, SomaticSniper, Muse2, Delly2}. |
| `rf_model` | path | Path to corresponding pre-trained random forest model. |
| `liftover_direction` | string | Conversion direction {GRCh37ToGRCh38, GRCh38ToGRCh37}. |
| `fasta_ref_37` | path | Path to the GRCh37 reference sequence (FASTA). |
| `fasta_ref_38` | path | Path to the GRCh38 reference sequence (FASTA). |
| `chain_file` | path | Path to LiftOver chain file between the source and target genome builds (included in resource-bundle.zip). |
| `funcotator_data_source` | path | Path to [Funcotator data source](https://gatk.broadinstitute.org/hc/en-us/articles/360050815792-FuncotatorDataSourceDownloader) directory. |
| `repeat_bed` | path | Path to bundled RepeatMasker annotation file (included in resource-bundle.zip). |
| `header_contigs` | path | Path to header contigs file corresponding to target genome build (included in resource-bundle.zip). |
| `gnomad_rds` | path | Path to gnomAD SV data.table for annotation (included in resource-bundle.zip). |

| Required Parameter | Type | Description |
| --------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `output_dir` | path | Path to the directory where the output files are to be saved. |
| `variant_caller` | string | Variant calling algorithm used to generate input VCF: [HaplotypeCaller, Mutect2, Strelka2, SomaticSniper, Muse2, Delly2-gSV, Delly2-sSV]. |
| `rf_model` | path | Path to corresponding pre-trained random forest model. |
| `liftover_direction` | string | Conversion direction: [GRCh37ToGRCh38, GRCh38ToGRCh37]. |
| `fasta_ref_37` | path | Path to the GRCh37 reference sequence (FASTA). |
| `fasta_ref_38` | path | Path to the GRCh38 reference sequence (FASTA). |
| `funcotator_data_source` | path | Path to [Funcotator data source](https://gatk.broadinstitute.org/hc/en-us/articles/360050815792-FuncotatorDataSourceDownloader) directory containing dbSNP, GENCODE and HGNC sources for SNV annotation. |
| `resource_bundle_path` | path | Path to unpacked [resource-bundle.zip](https://github.com/uclahs-cds/pipeline-StableLift/releases/download/v1.1.0/resource-bundle.zip). |

| Optional Parameter | Type | Default | Description |
| --------------------------- | ----------------------------------------------------------------------------------------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `target_threshold` | numeric | `""` | Target Stability Score threshold for variant filtering: [0, 1]. |
| `target_specificity` | numeric | `""` | Target specificity based on whole genome validation set for variant filtering: [0, 1]. |
| `extract_features_cpus` | int | `4` | Number of cpus to use for parallel parsing of large VCFs (>1GB). |
| `work_dir` | path | `/scratch/$SLURM_JOB_ID` | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |
| `work_dir` | path | `System.getenv("NXF_WORK")` | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |
| `save_intermediate_files` | boolean | false | If set, save output files from intermediate pipeline processes. |
| `min_cpus` | int | 1 | Minimum number of CPUs that can be assigned to each process. |
| `max_cpus` | int | `SysHelper.getAvailCpus()` | Maximum number of CPUs that can be assigned to each process. |
| `min_memory` | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `1.MB` | Minimum amount of memory that can be assigned to each process. |
| `max_memory` | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `SysHelper.getAvailMemory()` | Maximum amount of memory that can be assigned to each process. |
| `dataset_id` | string | `""` | Dataset ID to be used as output filename prefix. |
| `blcds_registered_dataset` | boolean | false | Set to true when using BLCDS folder structure; use false for now. |
| `ucla_cds` | boolean | true | If set, overwrite default memory and CPU values by UCLA cluster-specific configs. |
| `ucla_cds` | boolean | false | If set, overwrite default memory and CPU values by UCLA cluster-specific configs. |

---

Expand Down
2 changes: 1 addition & 1 deletion config/default.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ params {
dataset_id = ''
blcds_registered_dataset = false

ucla_cds = true
ucla_cds = false
docker_container_registry = "ghcr.io/uclahs-cds"

// Docker images
Expand Down
1 change: 0 additions & 1 deletion config/methods.config
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,6 @@ methods {
}
}


setup = {
methods.expand_parameters()

Expand Down
9 changes: 6 additions & 3 deletions config/template.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,22 @@ params {
// Output location
output_dir = ""

// Choices: ["HaplotypeCaller", "Mutect2", "Strelka2", "SomaticSniper", "Muse2", "Delly2-gSV", "Delly2-sSV"]
variant_caller = ""

// Choices: ["GRCh37ToGRCh38", "GRCh38ToGRCh37"]
liftover_direction = ""

// Choices: ["HaplotypeCaller", "Mutect2", "Strelka2", "SomaticSniper", "Muse2", "Delly2-gSV", "Delly2-sSV"]
variant_caller = ""

// Path to pre-trained random forest model
rf_model = ""

// Path to reference fasta files
fasta_ref_37 = "" // GRCh37-EBI-hs37d5/hs37d5.fa
fasta_ref_38 = "" // GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta

// Path to Funcotator data source directory containing dbSNP, GENCODE and HGNC sources for SNV annotation
funcotator_data_source = ""

// Path to unpacked resource-bundle.zip
resource_bundle_path = ""

Expand Down
19 changes: 8 additions & 11 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -30,32 +30,32 @@ log.info """\
dataset_id: ${params.dataset_id}
liftover_direction: ${params.liftover_direction}
variant_caller: ${params.variant_caller}
rf_model: ${params.rf_model}
src_fasta_id: ${params.src_fasta_id}
src_fasta_ref: ${params.src_fasta_ref}
src_fasta_fai: ${params.src_fasta_fai}
src_fasta_dict: ${params.src_fasta_dict}
dest_fasta_id: ${params.dest_fasta_id}
dest_fasta_ref: ${params.dest_fasta_ref}
dest_fasta_fai: ${params.dest_fasta_fai}
dest_fasta_dict: ${params.dest_fasta_dict}
chain_file: ${params.chain_file}
- SV only:
header_contigs: ${params.getOrDefault('header_contigs', null)}
gnomad_rds: ${params.getOrDefault('gnomad_rds', null)}
funcotator_data_source: ${params.getOrDefault('funcotator_data_source', null)}
- SNV only:
resource_bundle_path: ${params.resource_bundle_path}
chain_file: ${params.chain_file}
repeat_bed: ${params.getOrDefault('repeat_bed', null)}
funcotator_data_source: ${params.getOrDefault('funcotator_data_source', null)}
gnomad_rds: ${params.getOrDefault('gnomad_rds', null)}
header_contigs: ${params.getOrDefault('header_contigs', null)}
- output:
output_dir_base: ${params.output_dir_base}
- options:
save_intermediate_files: ${params.save_intermediate_files}
blcds_registered_dataset: ${params.blcds_registered_dataset}
ucla_cds: ${params.ucla_cds}
Expand Down Expand Up @@ -111,9 +111,6 @@ Channel

// Main workflow here
workflow {

// Currently this is written for a single sample_id and VCF file, but
// abstract that away
Channel.of ([
vcf: params.input.vcf,
index: indexFile(params.input.vcf),
Expand Down

0 comments on commit 4506e34

Please sign in to comment.