Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gxformat2 to convert .ga to .cwl? #33

Open
simleo opened this issue Nov 10, 2020 · 2 comments
Open

Use gxformat2 to convert .ga to .cwl? #33

simleo opened this issue Nov 10, 2020 · 2 comments
Assignees

Comments

@simleo
Copy link
Collaborator

simleo commented Nov 10, 2020

Came up at the 2020 Elixir biohackathon.

Experimented with this in https://github.com/ResearchObject/ro-crate-py/tree/gxformat2_cwl_conv. Here are the changes. I checked the output from converting test/test-data/test_galaxy_wf.ga and the one output by gxformat2 is very different from the one obtained with galaxy2cwl. I'm not even sure the latter is a valid CWL workflow. Did I use the gxformat2 API in the wrong way? If not, maybe this needs to be checked by a CWL expert.

@simleo
Copy link
Collaborator Author

simleo commented Nov 11, 2020

The file generated with gxformat2 does not validate. Building the docker container from https://github.com/ResearchObject/ro-crate-py/tree/9c2c74506226f4508985e86df7b1fa72f657f8b2:

docker build --no-cache -t ro-crate-py .
docker run --rm -it --name ro-crate-py ro-crate-py bash
# pip install pytest cwltool
# pytest test/
# cwltool --validate --enable-dev /tmp/pytest-of-root/pytest-current/test_galaxy_wf_cratecurrent/ro_crate_out/test_galaxy_wf.cwl 
INFO /usr/local/bin/cwltool 3.0.20201109103151
INFO Resolved '/tmp/pytest-of-root/pytest-current/test_galaxy_wf_cratecurrent/ro_crate_out/test_galaxy_wf.cwl' to 'file:///tmp/pytest-of-root/pytest-current/test_galaxy_wf_cratecurrent/ro_crate_out/test_galaxy_wf.cwl'
ERROR Tool definition failed validation:
No cwlVersion found. Use the following syntax in your CWL document to declare the version: cwlVersion: <version>.
Note: if this is a CWL draft-2 (pre v1.0) document then it will need to be upgraded first.

@simleo
Copy link
Collaborator Author

simleo commented Nov 11, 2020

The code was missing the from_dict step, thanks @ieguinoa for adding it.
However, the CWL file generated in the tests still does not validate (cwltool 3.0.20201109103151). Errors are like:

../tmp/pytest-of-root/pytest-current/test_galaxy_wf_cratecurrent/ro_crate_out/test_galaxy_wf.cwl:82:7:
Workflow step output 'realigned' does not correspond to
../tmp/pytest-of-root/pytest-current/test_galaxy_wf_cratecurrent/ro_crate_out/test_galaxy_wf.cwl:87:7:
tool output (expected '')

Here is the generated CWL:

class: Workflow
cwlVersion: v1.2
inputs:
  'GenBank file ':
    id: 'GenBank file '
    type: File
  Paired Collection (fastqsanger):
    id: Paired Collection (fastqsanger)
    type: File[]
outputs:
  _anonymous_output_1:
    outputSource: 'GenBank file '
    type: File
  _anonymous_output_2:
    outputSource: Paired Collection (fastqsanger)
    type: File
  _anonymous_output_3:
    outputSource: 2/snpeff_output
    type: File
  _anonymous_output_4:
    outputSource: 2/output_fasta
    type: File
  _anonymous_output_5:
    outputSource: 3/output_paired_coll
    type: File
  _anonymous_output_6:
    outputSource: 3/report_html
    type: File
  _anonymous_output_7:
    outputSource: 4/bam_output
    type: File
  FASTP_report:
    outputSource: 5/html_report
    type: File
  _anonymous_output_8:
    outputSource: 6/output1
    type: File
  _anonymous_output_9:
    outputSource: '7'
    type: File
  _anonymous_output_10:
    outputSource: 8/metrics_file
    type: File
  _anonymous_output_11:
    outputSource: 8/outFile
    type: File
  mapping_report:
    outputSource: 9/html_report
    type: File
  _anonymous_output_12:
    outputSource: 10/realigned
    type: File
  DeDup_Report:
    outputSource: 11/html_report
    type: File
  _anonymous_output_13:
    outputSource: 12/variants
    type: File
  _anonymous_output_14:
    outputSource: 13/statsFile
    type: File
  _anonymous_output_15:
    outputSource: 13/snpeff_output
    type: File
  _anonymous_output_16:
    outputSource: '14'
    type: File
  SnpEff vcf.gz:
    outputSource: 15/output1
    type: File
  _anonymous_output_17:
    outputSource: '16'
    type: File
steps:
  '10':
    in:
      reads:
        source: 8/outFile
      reference_source|ref:
        source: 2/output_fasta
    out:
    - realigned
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '11':
    in:
      results_0|software_cond|output_0|input:
        source: 8/metrics_file
    out:
    - plots
    - stats
    - html_report
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '12':
    in:
      reads:
        source: 10/realigned
      reference_source|ref:
        source: 2/output_fasta
    out:
    - variants
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '13':
    in:
      input:
        source: 12/variants
      snpDb|snpeff_db:
        source: 2/snpeff_output
    out:
    - snpeff_output
    - statsFile
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '14':
    in:
      input:
        source: 13/snpeff_output
    out: []
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '15':
    in:
      input1:
        source: 13/snpeff_output
    out:
    - output1
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '16':
    in:
      input_list:
        source: '14'
    out: []
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '2':
    in:
      input_type|input_gbk:
        source: 'GenBank file '
    out:
    - output_fasta
    - snpeff_output
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '3':
    in:
      single_paired|paired_input:
        source: Paired Collection (fastqsanger)
    out:
    - report_json
    - report_html
    - output_paired_coll
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '4':
    in:
      fastq_input|fastq_input1:
        source: 3/output_paired_coll
      reference_source|ref_file:
        source: 2/output_fasta
    out:
    - bam_output
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '5':
    in:
      results_0|software_cond|input:
        source: 3/report_json
    out:
    - plots
    - stats
    - html_report
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '6':
    in:
      input1:
        source: 4/bam_output
    out:
    - output1
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '7':
    in:
      input:
        source: 6/output1
    out: []
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '8':
    in:
      inputFile:
        source: 6/output1
    out:
    - outFile
    - metrics_file
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}
  '9':
    in:
      results_0|software_cond|output_0|type|input:
        source: '7'
    out:
    - plots
    - stats
    - html_report
    run:
      class: Operation
      doc: ''
      inputs: {}
      outputs: {}

Note that the inputs and outputs fields are empty. For comparison, the following is the CWL we are currently generating with galaxy2cwl:

class: Workflow
cwlVersion: v1.2.0-dev2
doc: 'Abstract CWL Automatically generated from the Galaxy workflow file: COVID-19:
  PE Variation'
inputs:
  'GenBank file ':
    format: data
    type: File
  Paired Collection (fastqsanger):
    format: data
    type: File
outputs: {}
steps:
  10_Realign reads:
    in:
      reads: 8_MarkDuplicates/outFile
      reference_source|ref: 2_SnpEff build/output_fasta
    out:
    - realigned
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_lofreq_viterbi_lofreq_viterbi_2_1_3_1+galaxy1
      inputs:
        reads:
          format: Any
          type: File
        reference_source|ref:
          format: Any
          type: File
      outputs:
        realigned:
          doc: bam
          type: File
  11_MultiQC:
    in:
      results_0|software_cond|output_0|input: 8_MarkDuplicates/metrics_file
    out:
    - stats
    - plots
    - html_report
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_multiqc_multiqc_1_7_1
      inputs:
        results_0|software_cond|output_0|input:
          format: Any
          type: File
      outputs:
        html_report:
          doc: html
          type: File
        plots:
          doc: input
          type: File
        stats:
          doc: input
          type: File
  12_Call variants:
    in:
      reads: 10_Realign reads/realigned
      reference_source|ref: 2_SnpEff build/output_fasta
    out:
    - variants
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_lofreq_call_lofreq_call_2_1_3_1+galaxy0
      inputs:
        reads:
          format: Any
          type: File
        reference_source|ref:
          format: Any
          type: File
      outputs:
        variants:
          doc: vcf
          type: File
  13_SnpEff eff:
    in:
      input: 12_Call variants/variants
      snpDb|snpeff_db: 2_SnpEff build/snpeff_output
    out:
    - snpeff_output
    - statsFile
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_snpeff_snpEff_4_3+T_galaxy1
      inputs:
        input:
          format: Any
          type: File
        snpDb|snpeff_db:
          format: Any
          type: File
      outputs:
        snpeff_output:
          doc: vcf
          type: File
        statsFile:
          doc: html
          type: File
  14_SnpSift Extract Fields:
    in:
      input: 13_SnpEff eff/snpeff_output
    out:
    - output
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_snpsift_snpSift_extractFields_4_3+t_galaxy0
      inputs:
        input:
          format: Any
          type: File
      outputs:
        output:
          doc: tabular
          type: File
  15_Convert VCF to VCF_BGZIP:
    in:
      input1: 13_SnpEff eff/snpeff_output
    out:
    - output1
    run:
      class: Operation
      id: CONVERTER_vcf_to_vcf_bgzip_0
      inputs:
        input1:
          format: Any
          type: File
      outputs:
        output1:
          doc: vcf_bgzip
          type: File
  16_Collapse Collection:
    in:
      input_list: 14_SnpSift Extract Fields/output
    out:
    - output
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_nml_collapse_collections_collapse_dataset_4_1
      inputs:
        input_list:
          format: Any
          type: File
      outputs:
        output:
          doc: input
          type: File
  2_SnpEff build:
    in:
      input_type|input_gbk: 'GenBank file '
    out:
    - snpeff_output
    - output_fasta
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_snpeff_snpEff_build_gb_4_3+T_galaxy4
      inputs:
        input_type|input_gbk:
          format: Any
          type: File
      outputs:
        output_fasta:
          doc: fasta
          type: File
        snpeff_output:
          doc: snpeffdb
          type: File
  3_fastp:
    in:
      single_paired|paired_input: Paired Collection (fastqsanger)
    out:
    - output_paired_coll
    - report_html
    - report_json
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_fastp_fastp_0_19_5+galaxy1
      inputs:
        single_paired|paired_input:
          format: Any
          type: File
      outputs:
        output_paired_coll:
          doc: input
          type: File
        report_html:
          doc: html
          type: File
        report_json:
          doc: json
          type: File
  4_Map with BWA-MEM:
    in:
      fastq_input|fastq_input1: 3_fastp/output_paired_coll
      reference_source|ref_file: 2_SnpEff build/output_fasta
    out:
    - bam_output
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_devteam_bwa_bwa_mem_0_7_17_1
      inputs:
        fastq_input|fastq_input1:
          format: Any
          type: File
        reference_source|ref_file:
          format: Any
          type: File
      outputs:
        bam_output:
          doc: bam
          type: File
  5_MultiQC:
    in:
      results_0|software_cond|input: 3_fastp/report_json
    out:
    - stats
    - plots
    - html_report
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_multiqc_multiqc_1_7_1
      inputs:
        results_0|software_cond|input:
          format: Any
          type: File
      outputs:
        html_report:
          doc: html
          type: File
        plots:
          doc: input
          type: File
        stats:
          doc: input
          type: File
  6_Filter SAM or BAM, output SAM or BAM:
    in:
      input1: 4_Map with BWA-MEM/bam_output
    out:
    - output1
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_devteam_samtool_filter2_samtool_filter2_1_8+galaxy1
      inputs:
        input1:
          format: Any
          type: File
      outputs:
        output1:
          doc: sam
          type: File
  7_Samtools stats:
    in:
      input: 6_Filter SAM or BAM, output SAM or BAM/output1
    out:
    - output
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_devteam_samtools_stats_samtools_stats_2_0_2+galaxy2
      inputs:
        input:
          format: Any
          type: File
      outputs:
        output:
          doc: tabular
          type: File
  8_MarkDuplicates:
    in:
      inputFile: 6_Filter SAM or BAM, output SAM or BAM/output1
    out:
    - metrics_file
    - outFile
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_devteam_picard_picard_MarkDuplicates_2_18_2_2
      inputs:
        inputFile:
          format: Any
          type: File
      outputs:
        metrics_file:
          doc: txt
          type: File
        outFile:
          doc: bam
          type: File
  9_MultiQC:
    in:
      results_0|software_cond|output_0|type|input: 7_Samtools stats/output
    out:
    - stats
    - plots
    - html_report
    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_multiqc_multiqc_1_7_1
      inputs:
        results_0|software_cond|output_0|type|input:
          format: Any
          type: File
      outputs:
        html_report:
          doc: html
          type: File
        plots:
          doc: input
          type: File
        stats:
          doc: input
          type: File

I've opened a draft PR from the branch to make it easier to track changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants