The CWLifying of the MUGQIC/GenAP Pipeline.

There are many independently functional groups that perform bioinformatics analysis that interoperability sometimes isn’t a concern. Making sure that a result can be reproduced by another group and that the same process can be repeated for other studies. CWL aims to separate the running of the workflow from its description in an attempt to standardize how workflows are shared and used.

Modifying the GenAP pipeline itself to output CWL would require bad control flow [i.e. if pipeline.cwl -> create_cwl_steps()] to create CWL and input objects instead of shell scripts. Instead we create something that works parallel to the pipeline but mimics most of the functionality and writes out CWL.

How the pipeline currently works:

given a readset file, a config file, and a number of steps: create a bash script that can be run by the cluster.
creates shell commands by using string formatting
problems with this: hard to modify bash scripts if necessary, not a portable system.
hard to read & modify

How CWLifying the pipeline will work

currently given a readset and a config file can create a directory containing all the associated CWL files as well as the workflow and inputs file.
Some Disadvantages/nuances:
- need to implement each tool individually
- scattering/gathering steps may be a bit clunky
- CWL file is separate from the actual workflow creation
- Need to test each cwl file independently first.
Some Advantages:
- choosing to add or not add values can use simple control flow
- Can reuse CWL files
- can share workflows outside of group relatively easily

*** Installation

Use Python 2.7 clone the repo: $ git clone https://github.com/screx/cwlify-mugqic.git create a virtual environment $ virtualenv venv install the required python packages $ pip install -r requirements.txt

UML Diagram of the proof of concept

workflow.py

Main script for creating the workflow and associated files
aggregates all the data required to create the workflow and validates that the workflow is functional.

pipeline.py

can probably change DNASeq Class to be a concrete implementation of a Pipeline class
each method is a step in the pipeline
uses the workflow class to actually implement the DNASeq pipeline & creates the workflow
methods are layed out as followed
specify the toolname and path to the actual CWL file we are trying to add and add it to the run directory using the add_to_run_folder() method
Here we gather all the values that are needed from the readset/config file
Add the outputs to the tool as an array of strings (from the names of the outputs in the tool description
they are then added as inputs to the workflow using the add_values() method
after all the inputs have been added create a dict where the key is the parameter name (found from the CWL file) and the value is the name of the input as added into cwl. if the value comes from another step it is of the form {toolname}/{parmeter_name}, otherwise it defaults to {toolname}_{parametername}
sometimes some extra control flow is needed to specify how to take arguments
Add any requirements to the system as needed
add the step to the workflow using the add_step() method

cwlify.py

For running the pipeline from the command line
‘-c’ to specifiy the config file
‘-readset’ to specify the readset file
name of the workflow goes at the end.

readset.py

parses and stores data for the readset (from GenAP pipelines but only for illumina readsets)

config.py

allows access of the data stored in the config.ini file (from GenAp Pipeline)

Running the command

Generates an inputs.yml and workflow.cwl file along with copying the relevant CWL tool descs into a new directory.

Now the workflow can be run when passed through a cwl runner(cwltool, toil, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cwl-local		cwl-local
images		images
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
common.py		common.py
config.py		config.py
cwlify.py		cwlify.py
dnaseq.hpf.ini		dnaseq.hpf.ini
pipeline.py		pipeline.py
readme.md		readme.md
readset.py		readset.py
requirements.txt		requirements.txt
rs.readset		rs.readset
sample.py		sample.py
workflow.py		workflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The CWLifying of the MUGQIC/GenAP Pipeline.

About

Releases

Packages

Languages

screx/cwlify-mugqic

Folders and files

Latest commit

History

Repository files navigation

The CWLifying of the MUGQIC/GenAP Pipeline.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages