Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sfitz input vcfs #274

Merged
merged 34 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
fbfb984
vcf input yaml
sorelfitzgibbon Feb 14, 2024
4eb7988
overlooked nftest.yml actual paths including tool versions
sorelfitzgibbon Feb 14, 2024
5967e75
working but need to add pipeval for input vcfs and tweak log structure
sorelfitzgibbon Feb 17, 2024
47bba4d
add test vcf yaml
sorelfitzgibbon Feb 17, 2024
f17d4f6
update VCF template yaml
sorelfitzgibbon Mar 11, 2024
92752b5
Merge branch 'main' of github.com:uclahs-cds/pipeline-call-sSNV into …
sorelfitzgibbon Mar 11, 2024
00a2d82
update changelog
sorelfitzgibbon Mar 21, 2024
13542e1
merge from origin main
sorelfitzgibbon May 13, 2024
17159d7
Autofix Nextflow configuration regression tests
sorelfitzgibbon May 14, 2024
a4f96cc
avoid unused process warnings
sorelfitzgibbon May 14, 2024
91865cf
add input vcf to nftest
sorelfitzgibbon May 14, 2024
defe784
update changelog
sorelfitzgibbon May 14, 2024
dc80c74
Merge branch 'sfitz-input-vcfs' with --no-ff
sorelfitzgibbon May 14, 2024
0d22d0f
avoid unused process warnings, all F config
sorelfitzgibbon May 14, 2024
7808bfd
Autofix Nextflow configuration regression tests
sorelfitzgibbon May 14, 2024
cebbe15
merge in from main
sorelfitzgibbon May 15, 2024
1ab2631
fix intersection script resources
sorelfitzgibbon May 15, 2024
59d0137
define input_type before used in F configs
sorelfitzgibbon May 15, 2024
a2b9425
keep configtests from main
sorelfitzgibbon May 15, 2024
40c5fd8
Autofix Nextflow configuration regression tests
sorelfitzgibbon May 15, 2024
31e287a
change process name
sorelfitzgibbon May 29, 2024
fba06b3
remove unused file
sorelfitzgibbon May 29, 2024
a74120d
fix input template
sorelfitzgibbon May 29, 2024
0511693
turn off intermediate output
sorelfitzgibbon May 29, 2024
65cbf52
consolidate include statements
sorelfitzgibbon May 29, 2024
a024846
add input checks
sorelfitzgibbon May 29, 2024
4486115
update configtests
sorelfitzgibbon May 29, 2024
6e6af05
Merge branch 'sfitz-input-vcfs' of github.com:uclahs-cds/pipeline-cal…
sorelfitzgibbon May 29, 2024
a1a92d6
Mask version numbers in tests
nwiltsie May 29, 2024
5480446
Autofix Nextflow configuration regression tests
sorelfitzgibbon May 29, 2024
56a29fb
Merge remote-tracking branch 'origin' into sfitz-input-vcfs
sorelfitzgibbon May 29, 2024
11d4c29
remove unused functions
sorelfitzgibbon May 30, 2024
bc570f0
move channel creation into workflow
sorelfitzgibbon May 30, 2024
f61eb0f
fix changelog
sorelfitzgibbon May 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions config/custom_schema_types.config
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,13 @@
* This custom schema namespace implements a custom type for checking input BAMs as a list of Maps.
*/
custom_schema_types {
allowed_sample_types = [
allowed_input_types = [
'normal',
'tumor'
'tumor',
'muse',
'mutect2',
'somaticsniper',
'strelka2'
]
allowed_resource_types = [
'memory',
Expand All @@ -14,7 +18,7 @@ custom_schema_types {
/**
* Check that input types are in allowed list
*/
check_sample_type_keys = { List given, String name, List choices=custom_schema_types.allowed_sample_types ->
check_input_type_keys = { List given, String name, List choices=custom_schema_types.allowed_input_types ->
for (elem in given) {
if (!(elem in choices)) {
throw new Exception("Invalid paramter ${name}. Valid types: ${choices}.")
Expand Down Expand Up @@ -80,7 +84,7 @@ custom_schema_types {
// Check parameters keys
custom_schema_types.check_if_namespace(options[name], name)
def given_keys = options[name].keySet() as ArrayList
custom_schema_types.check_sample_type_keys(given_keys, name)
custom_schema_types.check_input_type_keys(given_keys, name)

options[name].each { entry ->
def entry_as_map = [:]
Expand All @@ -98,7 +102,7 @@ custom_schema_types {
if (given_keys.size() <= 0) {
return
}
custom_schema_types.check_sample_type_keys(given_keys, name, custom_schema_types.allowed_resource_types)
custom_schema_types.check_input_type_keys(given_keys, name, custom_schema_types.allowed_resource_types)

options[name].each { entry ->
def entry_as_map = [:]
Expand All @@ -120,6 +124,19 @@ custom_schema_types {
}
}

/**
* Check if proper VCF entry list
*/
check_vcf_list = { Map options, String name, Map properties ->
custom_schema_types.check_if_list(options[name], name)
for (item in options[name]) {
custom_schema_types.check_if_namespace(item, name)
properties.elements.each { key, val ->
schema.validate_parameter(item, key, val)
}
}
}

/**
* Check list of resource updates
*/
Expand All @@ -134,6 +151,7 @@ custom_schema_types {
types = [
'InputNamespace': custom_schema_types.check_input_namespace,
'BAMEntryList': custom_schema_types.check_bam_list,
'VCFEntryList': custom_schema_types.check_vcf_list,
yashpatel6 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VCFEntryList is also not being used in the schema I think

'ResourceUpdateNamespace': custom_schema_types.check_resource_update_namespace,
'ResourceUpdateList': custom_schema_types.check_resource_update_list
]
Expand Down
1 change: 1 addition & 0 deletions config/default.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ params {
ucla_cds = true

cache_intermediate_pipeline_steps = false
keep_input_prefix = false
yashpatel6 marked this conversation as resolved.
Show resolved Hide resolved

//max_number_of_parallel_jobs = 1
max_cpus = SysHelper.getAvailCpus()
Expand Down
71 changes: 46 additions & 25 deletions config/methods.config
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,18 @@ methods {
}
}

get_vcfs_to_process = {
params.tumor_id = methods.sanitize_string(params.input_tumor_id)
params.normal_id = methods.sanitize_string(params.input_normal_id)
yashpatel6 marked this conversation as resolved.
Show resolved Hide resolved
params.samples_to_process = [] as Set
params.input.each { k, v ->
params.samples_to_process.add([
'path': v,
'algorithm': k]
)
}
}

set_sample_params = {
params.single_NT_paired = false
def tumor_ids = params.samples_to_process.findAll { it['sample_type'] == 'tumor' }['id']
Expand All @@ -71,17 +83,6 @@ methods {
}
}

set_intersect_regions_params = {
if (params.containsKey("intersect_regions") && params.intersect_regions) {
params.intersect_regions_index = "${params.intersect_regions}.tbi"
params.use_intersect_regions = true
} else {
params.intersect_regions = "${params.work_dir}/NO_FILE.bed"
params.intersect_regions_index = "${params.work_dir}/NO_FILE.bed.tbi"
params.use_intersect_regions = false
}
}

sorelfitzgibbon marked this conversation as resolved.
Show resolved Hide resolved
set_mutect2_params = {
if (params.containsKey("germline_resource_gnomad_vcf") && params.germline_resource_gnomad_vcf) {
params.germline = true
Expand All @@ -92,13 +93,33 @@ methods {
params.germline_resource_gnomad_vcf_index = "${params.germline_resource_gnomad_vcf}.tbi"
}

check_valid_algorithms = {
valid_algorithms = params.single_NT_paired ? ['somaticsniper', 'strelka2', 'mutect2', 'muse'] : ['mutect2']
for (algo in params.algorithm) {
if (!(algo in valid_algorithms)) {
throw new Exception("ERROR: params.algorithm ${params.algorithm} contains an invalid value. Valid algorithms for given inputs: ${valid_algorithms}")
}
}
}

Comment on lines +97 to +105
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from below

set_output_directory = {
def tz = TimeZone.getTimeZone("UTC")
def date = new Date().format("yyyyMMdd'T'HHmmss'Z'", tz)
params.output_dir_base = "${params.output_dir}/${manifest.name}-${manifest.version}/${params.sample_id}"
params.log_output_dir = "${params.output_dir_base}/log-${manifest.name}-${manifest.version}-${date}"
}

set_intersect_regions_params = {
if (params.containsKey("intersect_regions") && params.intersect_regions) {
params.intersect_regions_index = "${params.intersect_regions}.tbi"
params.use_intersect_regions = true
} else {
params.intersect_regions = "${params.work_dir}/NO_FILE.bed"
params.intersect_regions_index = "${params.work_dir}/NO_FILE.bed.tbi"
params.use_intersect_regions = false
}
}

set_pipeline_log = {
trace.enabled = true
trace.file = "${params.log_output_dir}/nextflow-log/trace.txt"
Expand All @@ -110,15 +131,6 @@ methods {
report.file = "${params.log_output_dir}/nextflow-log/report.html"
}

check_valid_algorithms = {
valid_algorithms = params.single_NT_paired ? ['somaticsniper', 'strelka2', 'mutect2', 'muse'] : ['mutect2']
for (algo in params.algorithm) {
if (!(algo in valid_algorithms)) {
throw new Exception("ERROR: params.algorithm ${params.algorithm} contains an invalid value. Valid algorithms for given inputs: ${valid_algorithms}")
}
}
}

setup = {
schema.load_custom_types("${projectDir}/config/custom_schema_types.config")
schema.validate()
Expand All @@ -127,13 +139,22 @@ methods {
methods.modify_base_allocations()
retry.setup_retry()
methods.set_env()
methods.get_ids_from_bams()
methods.set_sample_params()
methods.set_intersect_regions_params()
methods.set_mutect2_params()
if (params.input.containsKey('tumor')) {
params.input_type = 'bam'
methods.get_ids_from_bams()
methods.set_sample_params()
methods.set_mutect2_params()
methods.check_valid_algorithms()
} else if (params.input.containsKey('muse') || params.input.containsKey('somaticsniper') || params.input.containsKey('strelka2') || params.input.containsKey('mutect2')) {
params.input_type = 'vcf'
methods.get_vcfs_to_process()
params.sample_id = params.tumor_id
} else {
throw new Exception("ERROR: No input yamls detected.")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend separating this out into a couple of different functions: one to determine the input type and then another to handle the settings based on the determined type. Having it here like this right under setup makes it a bit messy and harder to track/separate logic

methods.set_output_directory()
methods.set_intersect_regions_params()
methods.set_pipeline_log()
methods.check_valid_algorithms()
methods.setup_docker_cpus()
}
}
24 changes: 22 additions & 2 deletions config/schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,11 @@ base_resource_update:
input:
type: 'InputNamespace'
required: true
help: 'Input samples'
help: 'Input to process'
elements:
tumor:
type: 'BAMEntryList'
required: true
required: false
help: 'Tumor id/path input'
elements:
BAM:
Expand All @@ -139,3 +139,23 @@ input:
mode: 'r'
required: true
help: 'Absolute path to normal sample BAM files'
muse:
type: 'Path'
mode: 'r'
required: false
help: 'Absolute path to muse VCF file'
mutect2:
type: 'Path'
mode: 'r'
required: false
help: 'Absolute path to mutect2 VCF file'
somaticsniper:
type: 'Path'
mode: 'r'
required: false
help: 'Absolute path to somaticsniper VCF file'
strelka2:
type: 'Path'
mode: 'r'
required: false
help: 'Absolute path to strelka2 VCF file'
13 changes: 13 additions & 0 deletions input/call-sSNV-template-VCF.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
patient_id: 'patient_id'
tumor_id: 'tumor_id' # must match the tumor_id in the VCF files
normal_id: 'normal_id' # must match the normal_id in the VCF files
yashpatel6 marked this conversation as resolved.
Show resolved Hide resolved
input:
muse:
- VCF: /path/to/muse.vcf.gz
mutect2:
- VCF: /path/to/mutect2.vcf.gz
somaticsniper:
- VCF: /path/to/somaticsniper.vcf.gz
strelka2:
- VCF: /path/to/strelka2.vcf.gz
Loading
Loading