Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update medaka to version 2.0.0 #1053

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [Mash](https://hub.docker.com/r/staphb/mash/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/mash)](https://hub.docker.com/r/staphb/mash) | <ul><li>2.1</li><li>2.2</li><li>2.3</li></ul> | https://github.com/marbl/Mash |
| [mashtree](https://hub.docker.com/r/staphb/mashtree) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/mashtree)](https://hub.docker.com/r/staphb/mashtree) | <ul><li>[0.52.0](./mashtree/0.52.0/)</li><li>[0.57.0](./mashtree/0.57.0/)</li><li>[1.0.4](./mashtree/1.0.4/)</li><li>[1.2.0](./mashtree/1.2.0/)</li><li>[1.4.6](./mashtree/1.4.6/)</li></ul> | https://github.com/lskatz/mashtree |
| [MaSuRCA](https://hub.docker.com/r/staphb/masurca) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/masurca)](https://hub.docker.com/r/staphb/masurca) | <ul><li>4.0.8</li><li>4.0.9</li><li>4.1.0</li></ul> | https://github.com/alekseyzimin/masurca |
| [medaka](https://hub.docker.com/r/staphb/medaka) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/medaka)](https://hub.docker.com/r/staphb/medaka) | <ul><li>0.8.1</li><li>1.0.1</li><li>1.2.0</li></ul> | https://github.com/nanoporetech/medaka |
| [medaka](https://hub.docker.com/r/staphb/medaka) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/medaka)](https://hub.docker.com/r/staphb/medaka) | <ul><li>[0.8.1](./medaka/0.8.1/)</li><li>[1.0.1](./medaka/1.0.1/)</li><li>[1.2.0](./medaka/1.2.0/)</li><li>[2.0.0](./medaka/2.0.0/)</li></ul> | https://github.com/nanoporetech/medaka |
| [metaphlan](https://hub.docker.com/r/staphb/metaphlan) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/metaphlan)](https://hub.docker.com/r/staphb/metaphlan) | <ul><li>[3.0.3-no-db (no database)](./metaphlan/3.0.3-no-db/)</li><li> [3.0.3 (~3GB db)](./metaphlan/3.0.3/)</li><li>[4.1.0](./metaphlan/4.1.0/) (no database)</li><li>[4.1.1](./metaphlan/4.1.1/) (no database)</li></ul> | https://github.com/biobakery/MetaPhlAn |
| [MIDAS](https://hub.docker.com/r/staphb/midas) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/midas)](https://hub.docker.com/r/staphb/midas) | <ul><li>1.3.2 (no database)</li> | https://github.com/snayfach/MIDAS |
| [minimap2](https://hub.docker.com/r/staphb/minimap2) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/minimap2)](https://hub.docker.com/r/staphb/minimap2) | <ul><li>2.17</li><li>2.18</li><li>2.21</li><li>2.22</li><li>2.23</li><li>2.24</li><li>2.25</li><li>[2.26](./minimap2/2.26)</li><li>[2.27](./minimap2/2.27/)</li>li>[2.28](./minimap2/2.28/)</li></ul> | https://github.com/lh3/minimap2 |
Expand Down
156 changes: 156 additions & 0 deletions medaka/2.0.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
ARG MEDAKA_VER="2.0.0"
ARG SAMTOOLS_VER="1.21"
ARG BCFTOOLS_VER=${SAMTOOLS_VER}
ARG HTSLIB_VER=${SAMTOOLS_VER}
ARG MINIMAP2_VER="2.28"

FROM ubuntu:jammy as builder

ARG SAMTOOLS_VER
ARG BCFTOOLS_VER
ARG HTSLIB_VER
ARG MINIMAP2_VER

# install dependencies required for compiling samtools
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install --no-install-recommends -y \
libncurses5-dev \
libbz2-dev \
liblzma-dev \
libcurl4-gnutls-dev \
zlib1g-dev \
libssl-dev \
gcc \
wget \
make \
perl \
bzip2 \
gnuplot \
ca-certificates \
curl \
perl \
bzip2 \
autoconf \
automake \
make \
gcc \
libdeflate-dev \
zlib1g-dev \
libbz2-dev \
liblzma-dev \
libcurl4-gnutls-dev \
libssl-dev \
libperl-dev \
libgsl0-dev \
libdeflate-dev \
procps && \
rm -rf /var/lib/apt/lists/* && apt-get autoclean

# download, compile, and install bcftools
RUN wget -q https://github.com/samtools/bcftools/releases/download/${BCFTOOLS_VER}/bcftools-${BCFTOOLS_VER}.tar.bz2 && \
tar -xjf bcftools-${BCFTOOLS_VER}.tar.bz2 && \
rm -v bcftools-${BCFTOOLS_VER}.tar.bz2 && \
cd bcftools-${BCFTOOLS_VER} && \
./configure --enable-libgsl --enable-perl-filters &&\
make && \
make install

# download, compile, and install samtools
RUN wget -q https://github.com/samtools/samtools/releases/download/${SAMTOOLS_VER}/samtools-${SAMTOOLS_VER}.tar.bz2 && \
tar -xjf samtools-${SAMTOOLS_VER}.tar.bz2 && \
cd samtools-${SAMTOOLS_VER} && \
./configure && \
make && \
make install

RUN wget -q https://github.com/samtools/htslib/releases/download/${HTSLIB_VER}/htslib-${HTSLIB_VER}.tar.bz2 && \
tar -vxjf htslib-${HTSLIB_VER}.tar.bz2 && \
rm -v htslib-${HTSLIB_VER}.tar.bz2 && \
cd htslib-${HTSLIB_VER} && \
./configure && \
make && \
make install

RUN curl -L https://github.com/lh3/minimap2/releases/download/v${MINIMAP2_VER}/minimap2-${MINIMAP2_VER}_x64-linux.tar.bz2 | tar -jxvf - --no-same-owner && \
ls && \
mv minimap2-${MINIMAP2_VER}_x64-linux/minimap2 /usr/local/bin

### start of app stage ###
FROM ubuntu:jammy as app

ARG MEDAKA_VER

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version=1
LABEL container.version="1"
LABEL software="Medaka"
LABEL software.version="${MEDAKA_VER}"
LABEL description="Consensus sequence correction provided by ONT Research"
LABEL website="https://github.com/nanoporetech/medaka"
LABEL license="https://github.com/nanoporetech/medaka/blob/master/LICENSE.md"
LABEL maintainer="Curtis Kapsak"
LABEL maintainer.email="[email protected]"
LABEL maintainer1="Erin Young"
LABEL maintainer1.email="[email protected]"

ARG DEBIAN_FRONTEND=noninteractive

# install dependencies required for running samtools
RUN apt-get update && apt-get install --no-install-recommends -y \
perl \
zlib1g \
libncurses5 \
bzip2 \
libdeflate-dev \
liblzma-dev \
libcurl4-gnutls-dev \
python3 \
python3-pip \
python-is-python3 \
python3-dev \
build-essential \
libssl-dev \
libffi-dev \
libxml2-dev \
libxslt1-dev \
zlib1g-dev \
libgsl-dev \
&& apt-get autoclean && rm -rf /var/lib/apt/lists/*

# copy in samtools, htslib, and minimap2 executables from builder stage
COPY --from=builder /usr/local/bin/* /usr/local/bin/

# install medaka via pip
RUN pip install medaka-cpu --extra-index-url https://download.pytorch.org/whl/cpu -vv --no-cache-dir && \
pip install pyabpoa -vv --no-cache-dir

ENV LC_ALL=C

# final working directory is /data
WORKDIR /data

# default command is to pull up help options
CMD medaka --help

### start of test stage ###
FROM app as test

RUN apt-get update && apt-get install --no-install-recommends -y wget

# making sure dependencies are in path
RUN samtools --help && minimap2 --help && tabix --help && bgzip --help

RUN medaka --help && \
medaka --version

# set working directory so that all test inputs & outputs are kept in /test
WORKDIR /test

# using on real data (CRPA isolate)
RUN wget -q https://www.ebi.ac.uk/ena/browser/api/fasta/GCA_021601745.3 -O GCA_021601745.3.fasta && \
wget -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR234/068/SRR23473168/SRR23473168_1.fastq.gz && \
medaka_consensus -i SRR23473168_1.fastq.gz -d GCA_021601745.3.fasta -o testing -t 4

# listing available models
RUN medaka tools list\_models
34 changes: 34 additions & 0 deletions medaka/2.0.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# medaka container

Main tool : [medaka](https://github.com/nanoporetech/medaka)

Additional tools:
- bcftools version 1.21
- samtools version 1.21
- htslib version 1.21
- minimap2 version 2.28

Full documentation: [https://github.com/nanoporetech/medaka](https://github.com/nanoporetech/medaka)

> medaka is a tool to create consensus sequences and variant calls from nanopore sequencing data. This task is performed using neural networks applied a pileup of individual sequencing reads against a draft assembly. It provides state-of-the-art results outperforming sequence-graph based methods and signal-based methods, whilst also being faster.

## Example Usage

```bash

# listing models
medaka tools list\_models

# polishing
medaka_consensus -i sample.fastq.gz -d sample.fasta -o medaka/sample -t 4

```

## Medaka models

Medaka updates frequently. These are the medaka models in this image:
```
Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_260bps_fast_g632, r1041_e82_260bps_fast_variant_g632, r1041_e82_260bps_hac_g632, r1041_e82_260bps_hac_v4.0.0, r1041_e82_260bps_hac_v4.1.0, r1041_e82_260bps_hac_variant_g632, r1041_e82_260bps_hac_variant_v4.1.0, r1041_e82_260bps_joint_apk_ulk_v5.0.0, r1041_e82_260bps_sup_g632, r1041_e82_260bps_sup_v4.0.0, r1041_e82_260bps_sup_v4.1.0, r1041_e82_260bps_sup_variant_g632, r1041_e82_260bps_sup_variant_v4.1.0, r1041_e82_400bps_bacterial_methylation, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_g632, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_fast_variant_g632, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_g632, r1041_e82_400bps_hac_v4.0.0, r1041_e82_400bps_hac_v4.1.0, r1041_e82_400bps_hac_v4.2.0, r1041_e82_400bps_hac_v4.3.0, r1041_e82_400bps_hac_v5.0.0, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_hac_variant_g632, r1041_e82_400bps_hac_variant_v4.1.0, r1041_e82_400bps_hac_variant_v4.2.0, r1041_e82_400bps_hac_variant_v4.3.0, r1041_e82_400bps_hac_variant_v5.0.0, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_v4.0.0, r1041_e82_400bps_sup_v4.1.0, r1041_e82_400bps_sup_v4.2.0, r1041_e82_400bps_sup_v4.3.0, r1041_e82_400bps_sup_v5.0.0, r1041_e82_400bps_sup_variant_g615, r1041_e82_400bps_sup_variant_v4.1.0, r1041_e82_400bps_sup_variant_v4.2.0, r1041_e82_400bps_sup_variant_v4.3.0, r1041_e82_400bps_sup_variant_v5.0.0, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_sup_plant_g610, r941_sup_plant_variant_g610
Default consensus: r1041_e82_400bps_sup_v5.0.0
Default variant: r1041_e82_400bps_sup_variant_v5.0.0
```