Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pharokka wrapper #5130

Merged
merged 14 commits into from
Mar 1, 2023
11 changes: 11 additions & 0 deletions tools/pharokka/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
owner: iuc
name: pharokka
description: rapid standardised annotation tool for bacteriophage genomes and metagenomes
long_description: |
pharokka is a rapid standardised annotation tool for bacteriophage genomes and metagenomes.
If you are looking for rapid standardised annotation of bacterial genomes, please use prokka,
which inspired the creation of pharokka, or bakta. Repository-Maintainer: Paul Zierep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get credits by adding tags to the tools ... see https://docs.galaxyproject.org/en/latest/dev/schema.html

And add yourself as maintainer via the codeowner file in this repo.

remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/pharokka
categories:
- Genome annotation
homepage_url: https://github.com/gbouras13/pharokka
188 changes: 188 additions & 0 deletions tools/pharokka/pharokka.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
<tool id='pharokka' name='bacteriophage annotation' version='1.2.0' python_template_version='3.5' profile='21.05'>
<description>
rapid standardised annotation tool for bacteriophage genomes and metagenomes
</description>
<requirements>
<requirement type='package' version='1.2.0'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pharokka
</requirement>
<requirement type='package' version='3.0'>
zip
</requirement>
</requirements>
<version_command>
pharokka.py --version
</version_command>
<command detect_errors='exit_code'>
<![CDATA[
## get DB based on data table or history
#if str( $reference_source.reference_source_selector ) == 'history':
echo 'use history' &&
mkdir pharokka_db &&
unzip '$reference_source.db_histroy' -d pharokka_db &&
#else:
echo 'use cache' &&
mkdir pharokka_db &&
unzip '$reference_source.db_cached.fields.path' -d pharokka_db &&
#end if

## run tool
#if str( $terminase.terminase_selector ) == 'no_terminase':
pharokka.py -i $fasta -o pharokka_output -d pharokka_db -t \${GALAXY_SLOTS:-8} $gene_predictor $meta -e $evalue &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all paths and text parameter needs to be single-quoted

#else:
pharokka.py -i $fasta -o pharokka_output -d pharokka_db -t \${GALAXY_SLOTS:-8} $gene_predictor $meta -e $evalue --terminase --terminase_strand $terminase.terminase_strand --terminase_start $terminase.terminase_start &&
#end if

## create output
zip -r out.zip pharokka_output
]]>
</command>
<inputs>
<!-- the genome -->
<param type='data' name='fasta' format='fasta' help='Please upload an genome file of a bacteriophage in fasta format.' label='Bacteriophage genome'/>
<!-- the DB -->
<conditional name='reference_source'>
<param name='reference_source_selector' type='select' label='Load DB from'>
<option value='cached'>
Local cache
</option>
<option value='history'>
History
</option>
</param>
<when value='cached'>
<param name='db_cached' type='select' label='Using built-in pharokka DB' help='Using built-in pharokka DB'>
<options from_data_table='pharokka_db'>
</options>
<validator type='no_options' message='A built-in pharokka DB is not available for the build associated with the selected input file' />
</param>
</when>
<when value='history'>
<param name='db_histroy' type='data' format='zip' label='Use the folloing pharokka DB' help='You can upload a pharokka DB as zip to the history and use it as DB' />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do users get such a DB?

</when>
</conditional>
<!-- additional arguments -->
<param name='gene_predictor' type='select' label='User specified gene predictor'>
<option value='-g phanotate'>
Phanotate
</option>
<option value='-g prodigal'>
Prodigal
</option>
</param>
<param name='meta' type='boolean' checked='false' truevalue='--meta' falsevalue='' label='meta mode for metavirome input samples' />
<param name='evalue' type='integer' value='100000' label='E-value threshold for mmseqs2 PHROGs database search. Defaults to 1E-05.' />
<!-- optional arguments -->
<conditional name='terminase'>
<param name='terminase_selector' type='select' label='Runs - terminase large subunit - re-orientation mode. Single genome input only and requires --terminase_strand and --terminase_start to be specified.'>
<option value='no_terminase'>
Do not run 'terminase large subunit' re-orientation mode.
</option>
<option value='run_terminase'>
Runs 'terminase large subunit' re-orientation mode.
</option>
</param>
<when value='no_terminase'>
</when>
<when value='run_terminase'>
<param name='terminase_strand' type='select' label='Strand of terminase large subunit.'>
<option value='pos'>
Positive
</option>
<option value='neg'>
Negative
</option>
</param>
<param name='terminase_start' type='integer' value='1' label='Start coordinate of the terminase large subunit.' />
</when>
</conditional>
</inputs>
<outputs>
<data name='archive_output' format='zip' from_work_dir='out.zip' label='${tool.name} on ${on_string}: zip of the complete output' />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this output file optional, so that the user needs to select an option to output it ... I don't think its so useful by default.

<data name='pharokka_gbk' format='genbank' from_work_dir='pharokka_output/pharokka.gbk' label='${tool.name} on ${on_string}: Genbank' />
<data name='pharokka_gff' format='gff' from_work_dir='pharokka_output/pharokka.gff' label='${tool.name} on ${on_string}: GFF' />
</outputs>
<tests>
<!-- test input from history -->
<test>
<param name='reference_source_selector' value='history' />
<param name='db_histroy' value='subset_pharokka_db.zip' />
<param name='fasta' value='SAOMS1.fasta' />
<!-- check file size and text since output is non-deterministic -->
<output name='pharokka_gbk'>
<assert_contents>
<has_size value='353875' delta='10' />
<has_text text='VERSION MW460250_1' />
</assert_contents>
</output>
<output name='pharokka_gff'>
<assert_contents>
<has_size value='191497' delta='10' />
<has_text text='##sequence-region MW460250_1 1 140135' />
</assert_contents>
</output>
<!-- check created zip -->
<output name='archive_output'>
<assert_contents>
<has_archive_member path='.*\/pharokka\.gff' />
<has_archive_member path='.*\/pharokka\.gbk' />
<has_archive_member path='.*\/pharokka.*\.log' />
</assert_contents>
</output>
</test>
<!-- test input from DB -->
<test>
<param name='reference_source_selector' value='cached' />
<param name='db_cached' value='pharokka_db' />
<param name='fasta' value='SAOMS1.fasta' />
<!-- check file size and text since output is non-deterministic -->
<output name='pharokka_gbk'>
<assert_contents>
<has_size value='353875' delta='10' />
<has_text text='VERSION MW460250_1' />
</assert_contents>
</output>
<output name='pharokka_gff'>
<assert_contents>
<has_size value='191497' delta='10' />
<has_text text='##sequence-region MW460250_1 1 140135' />
</assert_contents>
</output>
<!-- check created zip -->
<output name='archive_output'>
<assert_contents>
<has_archive_member path='.*\/pharokka\.gff' />
<has_archive_member path='.*\/pharokka\.gbk' />
<has_archive_member path='.*\/pharokka.*\.log' />
</assert_contents>
</output>
</test>
</tests>
<help>
<![CDATA[
pharokka is a rapid standardised annotation tool for bacteriophage genomes and metagenomes.

If you are looking for rapid standardised annotation of bacterial genomes, please use prokka, which inspired the creation of pharokka, or bakta.
]]>
</help>
<citations>
<citation type='bibtex'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use type=doi here

@article{bouras_pharokka_2023,
title = {Pharokka: a fast scalable bacteriophage annotation tool},
volume = {39},
issn = {1367-4811},
shorttitle = {Pharokka},
url = {https://doi.org/10.1093/bioinformatics/btac776},
doi = {10.1093/bioinformatics/btac776},
abstract = {In recent years, there has been an increasing interest in bacteriophages, which has led to growing numbers of bacteriophage genomic sequences becoming available. Consequently, there is a need for a rapid and consistent genomic annotation tool dedicated for bacteriophages. Existing tools either are not designed specifically for bacteriophages or are web- and email-based and require significant manual curation, which makes their integration into bioinformatic pipelines challenging. Pharokka was created to provide a tool that annotates bacteriophage genomes easily, rapidly and consistently with standards compliant outputs. Moreover, Pharokka requires only two lines of code to install and use and takes under 5 min to run for an average 50-kb bacteriophage genome.Pharokka is implemented in Python and is available as a bioconda package using ‘conda install -c bioconda pharokka’. The source code is available on GitHub (https://github.com/gbouras13/pharokka). Pharokka has been tested on Linux-64 and MacOSX machines and on Windows using a Linux Virtual Machine.},
number = {1},
urldate = {2023-02-14},
journal = {Bioinformatics},
author = {Bouras, George and Nepal, Roshan and Houtak, Ghais and Psaltis, Alkis James and Wormald, Peter-John and Vreugde, Sarah},
month = jan,
year = {2023},
pages = {btac776},
}
</citation>
</citations>
</tool>
Loading