Skip to content

Commit

Permalink
add extract_features_cpus to README
Browse files Browse the repository at this point in the history
  • Loading branch information
nkwang24 committed Oct 18, 2024
1 parent df1bce5 commit 6de91b9
Showing 1 changed file with 9 additions and 8 deletions.
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,15 +96,16 @@ input:

| Optional Parameter | Type | Default | Description |
| --------------------------- | ----------------------------------------------------------------------------------------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `target_threshold` | numeric | `""` | Target Stability Score threshold for variant filtering: [0, 1]. |
| `target_specificity` | numeric | `""` | Target specificity based on whole genome validation set for variant filtering: [0, 1]. |
| `target_threshold` | numeric | `""` | Target Stability Score threshold for variant filtering: [0, 1]. |
| `target_specificity` | numeric | `""` | Target specificity based on whole genome validation set for variant filtering: [0, 1]. |
| `extract_features_cpus` | int | `4` | Number of cpus to use for parallel parsing of large VCFs (>1GB). |
| `work_dir` | path | `/scratch/$SLURM_JOB_ID` | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively. |
| `save_intermediate_files` | boolean | false | If set, save output files from intermediate pipeline processes. |
| `min_cpus` | int | 1 | Minimum number of CPUs that can be assigned to each process. |
| `max_cpus` | int | `SysHelper.getAvailCpus()` | Maximum number of CPUs that can be assigned to each process. |
| `min_memory` | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `1.MB` | Minimum amount of memory that can be assigned to each process. |
| `max_memory` | [MemoryUnit](https://www.nextflow.io/docs/latest/script.html#implicit-classes-memoryunit) | `SysHelper.getAvailMemory()` | Maximum amount of memory that can be assigned to each process. |
| `dataset_id` | string | `""` | Dataset ID to be used as output filename prefix. |
| `dataset_id` | string | `""` | Dataset ID to be used as output filename prefix. |
| `blcds_registered_dataset` | boolean | false | Set to true when using BLCDS folder structure; use false for now. |
| `ucla_cds` | boolean | true | If set, overwrite default memory and CPU values by UCLA cluster-specific configs. |

Expand All @@ -114,18 +115,18 @@ input:

| Output | Description |
| ------------ | ------------------------ |
| `*_StableLift.vcf.gz` | Output VCF in target build coordinates with variant annotations and predicted Stability Scores. |
| `*_StableLift.vcf.gz.tbi` | Output VCF tabix index. |
| `*_StableLift-filtered.vcf.gz` | Filtered output VCF with predicted "Unstable" variants removed. |
| `*_StableLift-filtered.vcf.gz.tbi` | Filtered output VCF tabix index. |
| `*_StableLift-${target_build}.vcf.gz` | Output VCF in target build coordinates with variant annotations and predicted Stability Scores. |
| `*_StableLift-${target_build}.vcf.gz.tbi` | Output VCF tabix index. |
| `*_StableLift-${target_build}_filtered.vcf.gz` | Filtered output VCF with predicted "Unstable" variants removed. |
| `*_StableLift-${target_build}_filtered.vcf.gz.tbi` | Filtered output VCF tabix index. |

---

## Testing and Validation

### Test Dataset

10 whole genomes from [The Cancer Genome Atlas (TCGA-SARC)](https://portal.gdc.cancer.gov/projects/TCGA-SARC) were used to test pipeline outputs and validate model performance. All data was processed using [standardized Nextflow pipelines](https://github.com/uclahs-cds/metapipeline-DNA). Somatic VCFs from GRCh37 and GRCh38 alignments are available for the four supported sSNV callers as [release attachments](https://github.com/uclahs-cds/pipeline-StableLift/releases).
10 whole genomes from [The Cancer Genome Atlas (TCGA-SARC)](https://portal.gdc.cancer.gov/projects/TCGA-SARC) were used to test pipeline outputs and validate model performance. All data was processed using [standardized Nextflow pipelines](https://github.com/uclahs-cds/metapipeline-DNA). Somatic VCFs from GRCh37 and GRCh38 alignments are available for the four supported sSNV callers and DELLY2 sSV as [release attachments](https://github.com/uclahs-cds/pipeline-StableLift/releases).

| Donor ID | Normal Sample ID | Tumour Sample ID |
|----------------|---------------------------|---------------------------|
Expand Down

0 comments on commit 6de91b9

Please sign in to comment.