update inputs and documentaton

uclahs-cds · Oct 23, 2024 · 4506e34 · 4506e34
1 parent d325b1f
commit 4506e34
Show file tree

Hide file tree

Showing 5 changed files with 27 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -91,35 +91,31 @@ input:
 
 ### Input Configuration
 
-| Required Parameter                  | Type   | Description                                                                                                                                        |
-| ----------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `output_dir`                        | path   | Path to the directory where the output files are to be saved.                                                                                      |
-| `variant_caller`                    | string | Variant calling algorithm used to generate input VCF {HaplotypeCaller, Mutect2, Strelka2, SomaticSniper, Muse2, Delly2}.                           |
-| `rf_model`                          | path   | Path to corresponding pre-trained random forest model.                                                                                             |
-| `liftover_direction`                | string | Conversion direction {GRCh37ToGRCh38, GRCh38ToGRCh37}.                                                                                             |
-| `fasta_ref_37`                      | path   | Path to the GRCh37 reference sequence (FASTA).                                                                                                     |
-| `fasta_ref_38`                      | path   | Path to the GRCh38 reference sequence (FASTA).                                                                                                     |
-| `chain_file`                        | path   | Path to LiftOver chain file between the source and target genome builds (included in resource-bundle.zip).                                         |
-| `funcotator_data_source`            | path   | Path to [Funcotator data source](https://gatk.broadinstitute.org/hc/en-us/articles/360050815792-FuncotatorDataSourceDownloader) directory.         |
-| `repeat_bed`                        | path   | Path to bundled RepeatMasker annotation file (included in resource-bundle.zip).                                                                    |
-| `header_contigs`                    | path   | Path to header contigs file corresponding to target genome build (included in resource-bundle.zip).                                                |
-| `gnomad_rds`                        | path   | Path to gnomAD SV data.table for annotation (included in resource-bundle.zip).                                                                     |
-
+| Required Parameter          | Type   | Description                                                                                                                                        |
+| --------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `output_dir`                | path   | Path to the directory where the output files are to be saved.                                                                                      |
+| `variant_caller`            | string | Variant calling algorithm used to generate input VCF: [HaplotypeCaller, Mutect2, Strelka2, SomaticSniper, Muse2, Delly2-gSV, Delly2-sSV].          |
+| `rf_model`                  | path   | Path to corresponding pre-trained random forest model.                                                                                             |
+| `liftover_direction`        | string | Conversion direction: [GRCh37ToGRCh38, GRCh38ToGRCh37].                                                                                            |
+| `fasta_ref_37`              | path   | Path to the GRCh37 reference sequence (FASTA).                                                                                                     |
+| `fasta_ref_38`              | path   | Path to the GRCh38 reference sequence (FASTA).                                                                                                     |
+| `funcotator_data_source`    | path   | Path to [Funcotator data source](https://gatk.broadinstitute.org/hc/en-us/articles/360050815792-FuncotatorDataSourceDownloader) directory containing dbSNP, GENCODE and HGNC sources for SNV annotation.         |
+| `resource_bundle_path`      | path   | Path to unpacked [resource-bundle.zip](https://github.com/uclahs-cds/pipeline-StableLift/releases/download/v1.1.0/resource-bundle.zip).            |
 
 | Optional Parameter          | Type                                                                                      | Default                      | Description                                                                                                                                                                                                                                                                                                                                                                           |
 | --------------------------- | ----------------------------------------------------------------------------------------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | `target_threshold`          | numeric                                                                                   | `""`                         | Target Stability Score threshold for variant filtering: [0, 1]. |
 | `target_specificity`        | numeric                                                                                   | `""`                         | Target specificity based on whole genome validation set for variant filtering: [0, 1]. |
 | `extract_features_cpus`     | int                                                                                       | `4`                          | Number of cpus to use for parallel parsing of large VCFs (>1GB). |
-| `work_dir`                  | path                                                                                      | `/scratch/$SLURM_JOB_ID`     | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |
+| `work_dir`                  | path                                                                                      | `System.getenv("NXF_WORK")`  | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |
 | `save_intermediate_files`   | boolean                                                                                   | false                        | If set, save output files from intermediate pipeline processes.                                                                                                                                                                                                                                                                                                                       |
 | `min_cpus`                  | int                                                                                       | 1                            | Minimum number of CPUs that can be assigned to each process.                                                                                                                                                                                                                                                                                                                          |
 | `max_cpus`                  | int                                                                                       | `SysHelper.getAvailCpus()`   | Maximum number of CPUs that can be assigned to each process.                                                                                                                                                                                                                                                                                                                          |
 | `min_memory`                | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `1.MB`                       | Minimum amount of memory that can be assigned to each process.                                                                                                                                                                                                                                                                                                                        |
 | `max_memory`                | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `SysHelper.getAvailMemory()` | Maximum amount of memory that can be assigned to each process.                                                                                                                                                                                                                                                                                                                        |
 | `dataset_id`                | string                                                                                    | `""`                         | Dataset ID to be used as output filename prefix.                                                                                                                                                                                                                                                                                                                                      |
 | `blcds_registered_dataset`  | boolean                                                                                   | false                        | Set to true when using BLCDS folder structure; use false for now.                                                                                                                                                                                                                                                                                                                     |
-| `ucla_cds`                  | boolean                                                                                   | true                         | If set, overwrite default memory and CPU values by UCLA cluster-specific configs.                                                                                                                                                                                                                                                                                                     |
+| `ucla_cds`                  | boolean                                                                                   | false                        | If set, overwrite default memory and CPU values by UCLA cluster-specific configs.                                                                                                                                                                                                                                                                                                     |
 
 ---
 

diff --git a/config/default.config b/config/default.config
@@ -13,7 +13,7 @@ params {
     dataset_id = ''
     blcds_registered_dataset = false
 
-    ucla_cds = true
+    ucla_cds = false
     docker_container_registry = "ghcr.io/uclahs-cds"
 
     // Docker images

diff --git a/config/methods.config b/config/methods.config
@@ -103,7 +103,6 @@ methods {
         }
     }
 
-
     setup = {
         methods.expand_parameters()
 

diff --git a/config/template.config b/config/template.config
@@ -10,19 +10,22 @@ params {
     // Output location
     output_dir = ""
 
-    // Choices: ["HaplotypeCaller", "Mutect2", "Strelka2", "SomaticSniper", "Muse2", "Delly2-gSV", "Delly2-sSV"]
-    variant_caller = ""
-
     // Choices: ["GRCh37ToGRCh38", "GRCh38ToGRCh37"]
     liftover_direction = ""
 
+    // Choices: ["HaplotypeCaller", "Mutect2", "Strelka2", "SomaticSniper", "Muse2", "Delly2-gSV", "Delly2-sSV"]
+    variant_caller = ""
+
     // Path to pre-trained random forest model
     rf_model = ""
 
     // Path to reference fasta files
     fasta_ref_37 = "" // GRCh37-EBI-hs37d5/hs37d5.fa
     fasta_ref_38 = "" // GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta
 
+    // Path to Funcotator data source directory containing dbSNP, GENCODE and HGNC sources for SNV annotation
+    funcotator_data_source = ""
+
     // Path to unpacked resource-bundle.zip
     resource_bundle_path = ""
 

diff --git a/main.nf b/main.nf
@@ -30,32 +30,32 @@ log.info """\
         dataset_id: ${params.dataset_id}
 
         liftover_direction: ${params.liftover_direction}
-
         variant_caller: ${params.variant_caller}
         rf_model: ${params.rf_model}
 
+        src_fasta_id:    ${params.src_fasta_id}
         src_fasta_ref:   ${params.src_fasta_ref}
         src_fasta_fai:   ${params.src_fasta_fai}
         src_fasta_dict:  ${params.src_fasta_dict}
 
+        dest_fasta_id:   ${params.dest_fasta_id}
         dest_fasta_ref:  ${params.dest_fasta_ref}
         dest_fasta_fai:  ${params.dest_fasta_fai}
         dest_fasta_dict: ${params.dest_fasta_dict}
 
-        chain_file: ${params.chain_file}
-
-    - SV only:
-        header_contigs: ${params.getOrDefault('header_contigs', null)}
-        gnomad_rds: ${params.getOrDefault('gnomad_rds', null)}
+        funcotator_data_source: ${params.getOrDefault('funcotator_data_source', null)}
 
-    - SNV only:
+        resource_bundle_path: ${params.resource_bundle_path}
+        chain_file: ${params.chain_file}
         repeat_bed: ${params.getOrDefault('repeat_bed', null)}
-        funcotator_data_source: ${params.getOrDefault('funcotator_data_source', null)}
+        gnomad_rds: ${params.getOrDefault('gnomad_rds', null)}
+        header_contigs: ${params.getOrDefault('header_contigs', null)}
 
     - output:
         output_dir_base: ${params.output_dir_base}
 
     - options:
+        save_intermediate_files: ${params.save_intermediate_files}
         blcds_registered_dataset: ${params.blcds_registered_dataset}
         ucla_cds: ${params.ucla_cds}
 
@@ -111,9 +111,6 @@ Channel
 
 // Main workflow here
 workflow {
-
-    // Currently this is written for a single sample_id and VCF file, but
-    // abstract that away
     Channel.of ([
             vcf: params.input.vcf,
             index: indexFile(params.input.vcf),
-Original file line number
+Diff line change
@@ Expand Up / @@ -103,7 +103,6 @@ methods { @@
             }
         }
         setup = {
             methods.expand_parameters()
@@ Expand Down @@