Skip to content

Commit

Permalink
Merge pull request #1040 from taylorpaisie/tkp-rdp
Browse files Browse the repository at this point in the history
RDP classifier
  • Loading branch information
erinyoung committed Sep 10, 2024
2 parents 7caff63 + ae45b01 commit ea55b66
Show file tree
Hide file tree
Showing 7 changed files with 177 additions and 0 deletions.
1 change: 1 addition & 0 deletions Program_Licenses.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ The licenses of the open-source software that is contained in these Docker image
| raven | MIT | https://github.com/lbcb-sci/raven/blob/master/LICENSE |
| RAxML | GNU GPLv3 | https://github.com/stamatak/standard-RAxML/blob/master/gpl-3.0.txt |
| RAxML-NG | GNU AGPLv3| https://github.com/amkozlov/raxml-ng/blob/master/LICENSE.txt |
| RAxML-NG | GNU GPLv2 | https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html |
| ResFinder | Apache 2.0 | https://bitbucket.org/genomicepidemiology/resfinder/src/master/ |
| Roary | GNU GPLv3 | https://github.com/sanger-pathogens/Roary/blob/master/GPL-LICENSE |
| SalmID| MIT | https://github.com/hcdenbakker/SalmID/blob/master/LICENSE |
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [raven](https://hub.docker.com/r/staphb/raven/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/raven)](https://hub.docker.com/r/staphb/raven) | <ul><li>1.5.1</li><li>1.8.1</li><li>[1.8.3](./raven/1.8.3)</li></ul> | https://github.com/lbcb-sci/raven |
| [RAxML](https://hub.docker.com/r/staphb/raxml/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/raxml)](https://hub.docker.com/r/staphb/raxml) | <ul><li>8.2.12</li><li>[8.2.13](./raxml/8.2.13/)</li></ul> | https://github.com/stamatak/standard-RAxML |
| [RAxML-NG](https://hub.docker.com/r/staphb/raxml-ng/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/raxml-ng)](https://hub.docker.com/r/staphb/raxml-ng) | <ul><li>[1.2.2](./raxml-ng/1.2.2/)</li></ul> | https://github.com/amkozlov/raxml-ng |
| [rdp](https://hub.docker.com/r/staphb/rdp) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/rdp)](https://hub.docker.com/r/staphb/rdp) | <ul><li>[2.14](./rdp/0.4.0/)</li></ul> | https://sourceforge.net/projects/rdp-classifier/files/rdp-classifier/rdp_classifier_2.14.zip/download |
| [ResFinder](https://hub.docker.com/r/staphb/resfinder/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/resfinder)](https://hub.docker.com/r/staphb/resfinder) | <ul><li>[4.1.1](./resfinder/4.1.11/)</li><li>[4.5.0](./resfinder/4.5.0/)</li></ul> | https://bitbucket.org/genomicepidemiology/resfinder/src/master/ |
| [Roary](https://hub.docker.com/r/staphb/roary/) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/roary)](https://hub.docker.com/r/staphb/roary) | <ul><li>3.12.0</li><li>3.13.0</li></ul> | https://github.com/sanger-pathogens/Roary |
| [SalmID](https://hub.docker.com/r/staphb/salmid) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/salmid)](https://hub.docker.com/r/staphb/salmid) | <ul><li>0.1.23</li></ul> | https://github.com/hcdenbakker/SalmID |
Expand Down Expand Up @@ -367,3 +368,4 @@ Each Dockerfile lists the author(s)/maintainer(s) as a metadata `LABEL`, but the
* [@nawrockie](https://github.com/nawrockie)
* [@stephenturner](https://github.com/stephenturner)
* [@soejun](https://github.com/soejun)
* [@taylorpaisie](https://github.com/taylorpaisie)
59 changes: 59 additions & 0 deletions rdp/2.14/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# set global variables
ARG RDP_VER="2.14"

# build Dockerfile
FROM ubuntu:jammy as app
ARG RDP_VER

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="RDP Classifier"
LABEL software.version=${RDP_VER}
LABEL description="The RDP Classifier is a naive Bayesian classifier which was developed to provide rapid taxonomic placement based on rRNA sequence data."
LABEL website="https://github.com/rdpstaff/classifier"
LABEL documentation="https://sourceforge.net/projects/rdp-classifier/"
LABEL license.url="https://github.com/rdpstaff/classifier/blob/master/LICENSE"
LABEL maintainer="Taylor K. Paisie"
LABEL maintainer.email='[email protected]'

ENV DEBIAN_FRONTEND=noninteractive

# Install dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
openjdk-11-jre \
wget \
unzip && \
apt-get autoclean && rm -rf /var/lib/apt/lists/*

# Install rdp_classifer
RUN wget -q https://sourceforge.net/projects/rdp-classifier/files/rdp-classifier/rdp_classifier_${RDP_VER}.zip &&\
unzip rdp_classifier_${RDP_VER}.zip &&\
mv /rdp_classifier_${RDP_VER} /rdp_classifier &&\
chmod +x /rdp_classifier/dist/classifier.jar &&\
echo "#!/bin/bash" >> /rdp_classifier/dist/classifier &&\
echo "exec java -jar /rdp_classifier/dist/classifier.jar """"$""@"""" " >> /rdp_classifier/dist/classifier &&\
chmod +x /rdp_classifier/dist/classifier

ENV PATH="${PATH}:/rdp_classifier/dist" LC_ALL=C

CMD classifier

RUN mkdir data/
WORKDIR /data

# Running RDP on test controls
FROM app as test

WORKDIR /test

# running help to ensure executable is in path
RUN classifier

# testing on real files
RUN apt-get update && apt-get install -y \
python3 \
wget

RUN mkdir ../tests/
COPY tests/ ../tests/
RUN python3 -m unittest discover -v -s ../tests
53 changes: 53 additions & 0 deletions rdp/2.14/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# RDP Classifier

Main tool: [RDP Classifier](https://sourceforge.net/projects/rdp-classifier/)

Code repository: https://github.com/rdpstaff/classifier

Basic information on how to use this tool:
- executable: |
```
classify - classify one or multiple samples
crossvalidate - cross validate accuracy testing
libcompare - compare two samples
loot - leave one (sequence or taxon) out accuracy testing
merge-detail - merge classification detail result files to create a taxon assignment counts file
merge-count - merge multiple taxon assignment count files to into one count file
random-sample - random select a subset or subregion of sequences
rm-dupseq - remove identical or any sequence contained by another sequence
rm-partialseq - remove partial sequences
taxa-sim - calculate and plot the similarities within taxa
train - retrain classifier
```

- help: classify # with no flags
- version: NA
- description: |
> The RDP Classifier is a naive Bayesian classifier which was developed to provide rapid taxonomic placement based on rRNA sequence data.

Full documentation: https://sourceforge.net/projects/rdp-classifier/


## Example analysis
Get test data:
```
# Download test data
wget -nv https://raw.githubusercontent.com/taylorpaisie/docker_containers/main/rdp/2.14/16S_rRNA_gene.Burkholderia_pseudomallei.2002721184.AY305776.1.fasta -O 16S_test.fa
wget -nv https://raw.githubusercontent.com/taylorpaisie/docker_containers/main/rdp/2.14/18S_rRNA_gene.Homo_sapiens.T2T-CHM13v2.0.Chromosome13.fasta -O 18S_test.fa
```

Use RDP Classifier to get taxonomic assignments for bacterial and archaeal 16S rRNA sequences:
```
classifier classify -o taxa_16S_test.txt 16S_test.fa
classifier classify -o taxa_18S_test.txt 18S_test.fa
```

## Output
```
head -2 taxa_16S_test.txt
AY305776.1 Root rootrank 1.0 Bacteria domain 1.0 Pseudomonadota phylum 1.0 Betaproteobacteria class 1.0 Burkholderiales order 1.0 Burkholderiaceae family 1.0 Burkholderia genus 1.0
```


17 changes: 17 additions & 0 deletions rdp/2.14/tests/scripts/run_controls.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

# Download test data
wget -nv https://raw.githubusercontent.com/taylorpaisie/docker_containers/main/rdp/2.14/16S_rRNA_gene.Burkholderia_pseudomallei.2002721184.AY305776.1.fasta -O 16S_test.fa
wget -nv https://raw.githubusercontent.com/taylorpaisie/docker_containers/main/rdp/2.14/18S_rRNA_gene.Homo_sapiens.T2T-CHM13v2.0.Chromosome13.fasta -O 18S_test.fa

# Get taxonomic assignments for your data
classifier classify -o taxa_16S_test.txt 16S_test.fa
classifier classify -o taxa_18S_test.txt 18S_test.fa

# run checksum on files
sha256sum 16S_test.fa > 16S_checksum.txt
sha256sum 18S_test.fa > 18S_checksum.txt




31 changes: 31 additions & 0 deletions rdp/2.14/tests/test_controls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import unittest
import subprocess
from subprocess import PIPE


class TestControls(unittest.TestCase):
@classmethod
def setUpClass(cls):
command = "bash /tests/scripts/run_controls.sh"
subprocess.run(command, shell=True, stdout=PIPE)


def test_rdp16S(self):
with open("16S_checksum.txt") as f:
rdp_checksum = f.readlines()[0].split(" ")[0]
self.assertEqual(
rdp_checksum,
"a38342a9ba63946ffb4324c7858f5cc43b873673cb08080437f7500dda351f65",
)

def test_rdp18S(self):
with open("18S_checksum.txt") as f:
rdp_checksum = f.readlines()[0].split(" ")[0]
self.assertEqual(
rdp_checksum,
"44bf9c60750ff3b804b3e3a56969dab982307a16faee63f0928b2f54e70b02f7",
)


if __name__ == "__main__":
unittest.main()
14 changes: 14 additions & 0 deletions rdp/2.14/tests/test_versions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import unittest
import subprocess
import sys
import re


class TestVersion(unittest.TestCase):
def test_python(self):
version = f"{sys.version_info.major}.{sys.version_info.minor}"
self.assertEqual(version, "3.10") # Update this with the expected Python version


if __name__ == "__main__":
unittest.main()

0 comments on commit ea55b66

Please sign in to comment.