-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config System YAML #9
Comments
I was thinking about this some more. I tried #10 as a way of using the github code review tools to help discussion, but figured I'd just post here. I really like having the One True Config split by workflows. I made some mostly organizational changes to what you have above:
global:
title: My Very Cool Project
Author: Bob
assembly: dm6
sampleinfo: sample_metadata.csv
workflows.qc:
rules.trim:
adapters: adapters.fa
extra: "-q 20"
workflows.align:
rules:
align:
aligner: 'bowtie==2.0.2'
index: /data/...
cluster:
threads: 16
mem: 60g
walltime: 8:00:00
aln_suffix: '.bt2.bam'
log_suffix: '.bt2.log'
extra: "-p {threads} -k 8"
workflow.rnaseq:
factors:
- sex
- tissue
- time
models:
full_model: ~ sex + tissue + time
reduced_1: ~ sex + tissue
rules:
featurecounts:
annotation: /data/gene.gtf
extra: "-s 1"
featurecounts_intergenic:
annotation: /data/intergenic.gtf config lookupsSpecifying so much in the config will let us write some pretty generic workflows where input, output and params are basically just a ton of config dict lookups. rule align:
input:
index=config['workflows.align']['rules']['align']['index']
threads: config['workflows.align']['rules']['align']['cluster']['threads']
... Some options to think about: if we wrap the config in an object with dotted access, then it becomes slightly more readable: rule align:
input:
index=config.workflows_align.rules.align.index
threads: config.workflows_align.rules.align.cluster.threads
... Or syntax like the conda_build Metadata object, rule align:
input:
index=config.get('workflows.align/rules/align/index')
threads: config.get('workflows.align/rules/align/cluster/threads')
... cluster configI really like having the cluster config specified here alongside the rule. It could work if we provide a wrapper for calling snakemake that passes through most arguments, but extracts the cluster config info from the config file and builds a tempfile cluster_config.yaml that is passed to snakemake. The threads configured here can be injected into the rules at the end of the workflow by modifying |
I like the re-organization, the nesting cleans things up a bit. I think the dot notation lookups seem the cleanest. Sense the wrapper system is pulling the complexity out of the rules. I am thinking that the "workflow" should contain all of its own rules and try to make all of the settings some sort of lookup from the global config. I also like having the cluster config side-by-side. I will look at #10 and make individual comments there. Did not know you could do the line by line comments with PRs. |
Seems like an elegant option for the dot notation from http://stackoverflow.com/a/7534478. Given the function: def cfg(val):
current_data = config
for chunk in val.split('.'):
current_data = current_data.get(chunk, {})
return current_data the lookup becomes: rule align:
input:
index=cfg('workflows_align.rules.align.index')
threads: cfg('workflows_align.rules.align.cluster.threads')
... The reason I like this is that the global config dict remains unchanged as a dict. The other answers in that stackoverflow question have other options, but I worry about converting the global config dict to something else in case snakemake is using it for other things we don't know about that assume the full dict interface. |
Also we really should have config validation once things settle down into a format. For example, we could keep a validation schema file that includes default values, then have code to build an example config using that schema and validates the generated config. The user edits that config, which is then validated again before use. Luckily I have existing code for exactly this! I'll port it over. |
Starting to think about the config system YAML.
The text was updated successfully, but these errors were encountered: