Skip to content

Commit

Permalink
Add seqtk mergepe and improve docs (#380)
Browse files Browse the repository at this point in the history
* add seqtk mergepe wrapper

* improve contributing docs and add params section to wrappers

* add URL field to wrapper docs

* add compress clarification

Co-authored-by: David Laehnemann <[email protected]>

* add link to meta

Co-authored-by: David Laehnemann <[email protected]>

* link to new meta docs

Co-authored-by: David Laehnemann <[email protected]>

* update contributing docs with suggestions

* pin minor versions

* add more doc updates from suggestions

* Mention @mention possibility.

Co-authored-by: David Laehnemann <[email protected]>

Co-authored-by: Jan Forster <[email protected]>
Co-authored-by: David Laehnemann <[email protected]>
Co-authored-by: Johannes Köster <[email protected]>
  • Loading branch information
4 people authored Jun 29, 2021
1 parent 48be356 commit 0aa394a
Show file tree
Hide file tree
Showing 12 changed files with 246 additions and 63 deletions.
6 changes: 6 additions & 0 deletions bio/seqtk/mergepe/environment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
channels:
- bioconda
- conda-forge
dependencies:
- seqtk =1.3
- pigz =2.3
19 changes: 19 additions & 0 deletions bio/seqtk/mergepe/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: seqtk mergepe
description: Interleave two paired-end FASTA/Q files
url: https://github.com/lh3/seqtk
authors:
- Michael Hall
input:
- paired fastq files - can be compressed in gzip format (``*.gz``).
output:
- >
a single, interleaved FASTA/Q file. By default, the output will be compressed,
use the param ``compress_lvl`` to change this.
params:
compress_lvl: >
Regulate the speed of compression using the specified digit,
where 1 indicates the fastest compression method (less compression)
and 9 indicates the slowest compression method (best compression).
0 is no compression. 11 gives a few percent better compression at a severe cost
in execution time, using the zopfli algorithm. The default is 6.
notes: Multiple threads can be used during compression of the output file with ``pigz``.
13 changes: 13 additions & 0 deletions bio/seqtk/mergepe/test/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
rule seqtk_mergepe:
input:
r1="{sample}.1.fastq.gz",
r2="{sample}.2.fastq.gz",
output:
merged="{sample}.merged.fastq.gz",
params:
compress_lvl=9,
log:
"logs/seqtk_mergepe/{sample}.log",
threads: 2
wrapper:
"master/bio/seqtk/mergepe"
Binary file added bio/seqtk/mergepe/test/a.1.fastq.gz
Binary file not shown.
Binary file added bio/seqtk/mergepe/test/a.2.fastq.gz
Binary file not shown.
16 changes: 16 additions & 0 deletions bio/seqtk/mergepe/wrapper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""Snakemake wrapper for interleaving reads from paired FASTA/Q files using seqtk."""

__author__ = "Michael Hall"
__copyright__ = "Copyright 2021, Michael Hall"
__email__ = "[email protected]"
__license__ = "MIT"

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True, append=False)
compress_lvl = int(snakemake.params.get("compress_lvl", 6))

shell(
"(seqtk mergepe {snakemake.input} "
"| pigz -{compress_lvl} -c -p {snakemake.threads}) > {snakemake.output} {log}"
)
1 change: 0 additions & 1 deletion bio/seqtk/subsample/pe/test/logs/seqtk_subsample/a.log

This file was deleted.

1 change: 0 additions & 1 deletion bio/seqtk/subsample/se/test/logs/seqtk_subsample/a.log

This file was deleted.

21 changes: 21 additions & 0 deletions docs/_templates/wrapper.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

{{ description }}

**URL**: {{ url }}

Example
-------
Expand All @@ -16,6 +17,7 @@ This wrapper can be used in the following way:
{{ snakefile }}

Note that input, output and log file paths can be chosen freely.

When running with

.. code-block:: bash
Expand Down Expand Up @@ -53,6 +55,25 @@ Input/Output
{% endfor %}
{% endif %}

{% if params %}

Params
------

{# Parse the params section of .yaml #}
{% for key in params %}
{% if key is mapping %}
{% for k, value in key.items() %}
* ``{{ k }}``: {{ value }}
{% endfor %}
{% else %}
* ``{{ key }}``: {{ params[key] }}
{% endif %}

{% endfor %}

{% endif %}

{% if notes %}

Notes
Expand Down
143 changes: 143 additions & 0 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
.. _contributing:

Contributing
============

We invite anybody to contribute to the Snakemake Wrapper Repository.
If you want to contribute we suggest the following procedure:

#. Fork the repository: https://github.com/snakemake/snakemake-wrappers
#. Clone your fork locally.
#. Locally, create a new branch: ``git checkout -b my-new-snakemake-wrapper``
#. Commit your contributions to that branch and push them to your fork: ``git push -u origin my-new-snakemake-wrapper``
#. Create a pull request.

The pull request will be reviewed and included as fast as possible.
If your pull request does not get a review quickly, you can `@mention <https://github.blog/2011-03-23-mention-somebody-they-re-notified/>` previous contributors to a particular wrapper (``git blame``) or regular contributors that you think might be able to give a review.
Contributions should follow the coding style of the already present examples, i.e.:

* provide a ``meta.yaml`` that describes the wrapper (see the `meta.yaml documentation below <meta>`_)
* provide an ``environment.yaml`` which lists all required software packages and follows
`the respective best practices <https://stackoverflow.com/a/64594513/2352071>`_. The
packages should be available for installation via the
`default anaconda channels <https://anaconda.org/anaconda>`_ or via the
`conda`_ channels
`bioconda <https://bioconda.github.io/recipes.html>`_ or
`conda-forge <https://conda-forge.org/feedstocks/>`_.
Other sustainable community maintained channels are possible as well.
* add a ``wrapper.py`` or ``wrapper.R`` file that can deal with arbitrary ``input:`` and ``output:`` paths.
* provide a minimal test case in a subfolder called ``test``, with an example
``Snakefile`` that shows how to use the wrapper (rule names should be descriptive and written in `snake_case <https://en.wikipedia.org/wiki/Snake_case>`_), some minimal testing data
(also check existing wrappers for suitable data) and add an invocation of the
test in ``test.py``
* ensure consistent `formatting`_ of Python files and `linting`_ of Snakefiles.

.. _meta:

``meta.yaml`` file
-------------------

The following fields are available to use in the wrapper ``meta.yaml`` file. All, except
those marked optional, should be provided.

* **name**: The name of the wrapper.
* **description**: a description of what the wrapper does.
* **url**: URL to the wrapper tool webpage.
* **authors**: A `sequence`_ of names of the people who have contributed to the wrapper.
* **input**: A `mapping`_ or `sequence`_ of required inputs for the wrapper.
* **output**: A `mapping`_ or `sequence`_ of output(s) from the wrapper.
* **params** (optional): A `mapping`_ of parameters that can be used in the wrapper's ``params`` directive. If no parameters are used for the wrapper, this field can be omitted.
* **notes** (optional): Anything of note that does not fit into the scope of the other fields.

Example
^^^^^^^

.. code-block:: yaml
name: seqtk mergepe
description: Interleave two paired-end FASTA/Q files
url: https://github.com/lh3/seqtk
authors:
- Michael Hall
input:
- paired fastq files - can be compressed.
output:
- >
a single, interleaved FASTA/Q file. By default, the output will be compressed,
use the param ``compress_lvl`` to change this.
params:
compress_lvl: >
Regulate the speed of compression using the specified digit,
where 1 indicates the fastest compression method (less compression)
and 9 indicates the slowest compression method (best compression).
0 is no compression. 11 gives a few percent better compression at a severe cost
in execution time, using the zopfli algorithm. The default is 6.
notes: Multiple threads can be used during compression of the output file with ``pigz``.
.. _sequence: https://yaml.org/spec/1.2/spec.html#id2759963
.. _mapping: https://yaml.org/spec/1.2/spec.html#id2759963

.. _formatting:

Formatting
----------

Please ensure Python files such as ``test.py`` and ``wrapper.py`` are formatted with
|black|_. Additionally, please format your test ``Snakefile`` with |snakefmt|_.

.. |black| replace:: ``black``
.. _black: https://github.com/psf/black
.. |snakefmt| replace:: ``snakefmt``
.. _snakefmt: https://github.com/snakemake/snakefmt

.. _linting:

Linting
-------

Please `lint`_ your test ``Snakefile`` with::

snakemake -s <path/to/wrapper/test/Snakefile> --lint

.. _lint: https://snakemake.readthedocs.io/en/stable/snakefiles/writing_snakefiles.html#best-practices

Testing locally
---------------

If you want to debug your contribution locally (before creating a pull request), you
can install all dependencies with |mamba|_ (or |conda|_). `Install miniconda with the
channels as described for bioconda <https://bioconda.github.io/#using-bioconda>`_ and
set up an environment with the necessary dependencies and activate it::

mamba create -n test-snakemake-wrappers snakemake pytest conda snakefmt black
conda activate test-snakemake-wrappers

Afterwards, from the main directory of the repo, you can run the test(s) for your
contribution by `specifying an expression <https://docs.pytest.org/en/stable/usage.html#specifying-tests-selecting-tests>`_
that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``::

pytest test.py -v -k your_test


If you also want to test the docs generation locally, create another environment
and activate it::

mamba create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton
conda activate test-snakemake-wrapper-docs

Then, enter the respective directory and build the docs::

cd docs
make html

If it runs through, you can open the main page at ``docs/_build/html/index.html``
in a web browser. If you want to start fresh, you can clean up the build
with ``make clean``.


.. |mamba| replace:: ``mamba``
.. _mamba: https://github.com/mamba-org/mamba
.. |conda| replace:: ``conda``
.. _conda: https://conda.io
71 changes: 11 additions & 60 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,69 +55,12 @@ For the above example, the explicit GitHub URL to specify would need to be the `
"https://github.com/snakemake/snakemake-wrappers/raw/0.2.0/bio/samtools/sort"
Contribute
----------
Contributing
------------

We invite anybody to contribute to the Snakemake Wrapper Repository.
If you want to contribute we suggest the following procedure:
If you want to contribute refer to the :ref:`contributing guide <contributing>`.

#. Fork the repository: https://github.com/snakemake/snakemake-wrappers
#. Clone your fork locally.
#. Locally, create a new branch: ``git checkout -b my-new-snakemake-wrapper``
#. Commit your contributions to that branch and push them to your fork: ``git push -u origin my-new-snakemake-wrapper``
#. Create a pull request.

The pull request will be reviewed and included as fast as possible.
Contributions should follow the coding style of the already present examples, i.e.:

* provide a ``meta.yaml`` with name, description and author(s) of the wrapper
* provide an ``environment.yaml`` which lists all required software packages (the
packages should be available for installation via the
`default anaconda channels <https://anaconda.org/anaconda>`_ or via the
`conda <https://conda.io/docs/>`_ channels
`bioconda <https://bioconda.github.io/recipes.html>`_ or
`conda-forge <https://conda-forge.org/feedstocks/>`_.
Other sustainable community maintained channels are possible as well.)
* provide a minimal test case in a subfolder called ``test``, with an example
``Snakefile`` that shows how to use the wrapper, some minimal testing data
(also check existing wrappers for suitable data) and add an invocation of the
test in ``test.py``
* follow the python `style guide <http://legacy.python.org/dev/peps/pep-0008>`_,
using 4 spaces for indentation.

Testing locally
^^^^^^^^^^^^^^^

If you want to debug your contribution locally, before creating a pull request,
we recommend adding your test case to the start of the list in ``test.py``, so
that it runs first. Then, `install miniconda with the channels as described for
bioconda <https://bioconda.github.io/#using-bioconda>`_ and set up an
environment with the necessary dependencies and activate it::

conda create -n test-snakemake-wrappers snakemake pytest conda
conda activate test-snakemake-wrappers

Afterwards, from the main directory of the repo, you can run the tests with::

pytest test.py -v

If you use a keyboard interrupt after your test has failed, you will get all
the relevant stdout and stderr messages printed.

If you also want to test the docs generation locally, create another environment
and activate it::

conda create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton
conda activate test-snakemake-wrapper-docs

Then, enter the respective directory and build the docs::

cd docs
make html

If it runs through, you can open the main page at ``docs/_build/html/index.html``
in a web browser. If you want to start fresh, you can clean up the build
with ``make clean``.

.. toctree::
:maxdepth: 4
Expand All @@ -127,3 +70,11 @@ with ``make clean``.

wrappers
meta-wrappers


.. toctree::
:caption: Development
:maxdepth: 2
:hidden:

contributing
18 changes: 17 additions & 1 deletion test.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,14 +130,15 @@ def run(wrapper, cmd, check_log=None):
os.chdir(origdir)



@skip_if_not_modified
def test_rbt_csvreport():
run(
"bio/rbt/csvreport",
["snakemake", "--cores", "1", "qc_data", "--use-conda", "-F"],
)


@skip_if_not_modified
def test_liftoff():
run(
Expand Down Expand Up @@ -573,6 +574,21 @@ def test_shovill():
)


@skip_if_not_modified
def test_seqtk_mergepe():
run(
"bio/seqtk/mergepe",
[
"snakemake",
"--cores",
"1",
"--use-conda",
"-F",
"a.merged.fastq.gz",
],
)


@skip_if_not_modified
def test_seqtk_subsample_se():
run(
Expand Down

0 comments on commit 0aa394a

Please sign in to comment.