Skip to content

Commit

Permalink
adding pango_aliasor version 0.3.0 (#1011)
Browse files Browse the repository at this point in the history
* adding pango_aliasor version 0.3.0

* added credits

* adding pango aliasor

---------

Co-authored-by: Kutluhan Incekara <[email protected]>
  • Loading branch information
erinyoung and Kincekara authored Aug 9, 2024
1 parent 9116cdc commit 37f0166
Show file tree
Hide file tree
Showing 5 changed files with 186 additions and 0 deletions.
1 change: 1 addition & 0 deletions Program_Licenses.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ The licenses of the open-source software that is contained in these Docker image
| ONTime | MIT | https://github.com/mbhall88/ontime/blob/main/LICENSE |
| OrthoFinder | GNU GPLv3 | https://github.com/davidemms/OrthoFinder/blob/master/License.md |
| Panaroo | MIT | https://github.com/gtonkinhill/panaroo/blob/master/LICENSE |
| pango_aliasor | MIT | https://github.com/corneliusroemer/pango_aliasor/blob/main/LICENSE |
| Pangolin | GNU GPLv3 | https://github.com/cov-lineages/pangolin/blob/master/LICENSE.txt |
| panqc | MIT | https://github.com/maxgmarin/panqc/blob/main/LICENSE |
| Parsnp | Battelle National Biodefense Institute (BNBI) | https://github.com/marbl/parsnp?tab=License-1-ov-file#readme |
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ To learn more about the docker pull rate limits and the open source software pro
| [ONTime](https://hub.docker.com/r/staphb/ontime) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/ontime)](https://hub.docker.com/r/staphb/ontime) | <ul><li>[0.2.3](ontime/0.2.3/)</li><li>[0.3.1](ontime/0.3.1/)</li></ul> | https://github.com/mbhall88/ontime |
| [OrthoFinder](https://hub.docker.com/r/staphb/orthofinder) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/orthofinder)](https://hub.docker.com/r/staphb/orthofinder) | <ul><li>2.17</li></ul> | https://github.com/davidemms/OrthoFinder |
| [Panaroo](https://hub.docker.com/r/staphb/panaroo) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/panaroo)](https://hub.docker.com/r/staphb/panaroo) | <ul><li>[1.2.10](panaroo/1.2.10/)</li><li>[1.3.4](panaroo/1.3.4/)</li><li>[1.5.0](./panaroo/1.5.0/)</li></ul>| (https://hub.docker.com/r/staphb/panaroo) |
| [pango_aliasor](https://hub.docker.com/r/staphb/pango_aliasor) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/pango_aliasor)](https://hub.docker.com/r/staphb/pango_aliasor) | <ul><li>[0.3.0](./pango_aliasor/0.3.0/)</li></ul>| https://github.com/corneliusroemer/pango_aliasor |
| [Pangolin](https://hub.docker.com/r/staphb/pangolin) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/pangolin)](https://hub.docker.com/r/staphb/pangolin) | <details><summary> Click to see Pangolin v4.2 and older versions! </summary> **Pangolin version & pangoLEARN data release date** <ul><li>1.1.14</li><li>2.0.4 & 2020-07-20</li><li>2.0.5 & 2020-07-20</li><li>2.1.1 & 2020-12-17</li><li>2.1.3 & 2020-12-17</li><li>2.1.6 & 2021-01-06</li><li>2.1.7 & 2021-01-11</li><li>2.1.7 & 2021-01-20</li><li>2.1.8 & 2021-01-22</li><li>2.1.10 & 2021-02-01</li><li>2.1.11 & 2021-02-01</li><li>2.1.11 & 2021-02-05</li><li>2.2.1 & 2021-02-06</li><li>2.2.2 & 2021-02-06</li><li>2.2.2 & 2021-02-11</li><li>2.2.2 & 2021-02-12</li><li>2.3.0 & 2021-02-12</li><li>2.3.0 & 2021-02-18</li><li>2.3.0 & 2021-02-21</li><li>2.3.2 & 2021-02-21</li><li>2.3.3 & 2021-03-16</li><li>2.3.4 & 2021-03-16</li><li>2.3.5 & 2021-03-16</li><li>2.3.6 & 2021-03-16</li><li>2.3.6 & 2021-03-29</li><li>2.3.8 & 2021-04-01</li><li>2.3.8 & 2021-04-14</li><li>2.3.8 & 2021-04-21</li><li>2.3.8 & 2021-04-23</li><li>2.4 & 2021-04-28</li><li>2.4.1 & 2021-04-28</li><li>2.4.2 & 2021-04-28</li><li>2.4.2 & 2021-05-10</li><li>2.4.2 & 2021-05-11</li><li>2.4.2 & 2021-05-19</li><li>3.0.5 & 2021-06-05</li><li>3.1.3 & 2021-06-15</li><li>3.1.5 & 2021-06-15</li><li>3.1.5 & 2021-07-07-2</li><li>3.1.7 & 2021-07-09</li><li>3.1.8 & 2021-07-28</li><li>3.1.10 & 2021-07-28</li><li>3.1.11 & 2021-08-09</li><li>3.1.11 & 2021-08-24</li><li>3.1.11 & 2021-09-17</li><li>3.1.14 & 2021-09-28</li><li>3.1.14 & 2021-10-13</li><li>3.1.16 & 2021-10-18</li><li>3.1.16 & 2021-11-04</li><li>3.1.16 & 2021-11-09</li><li>3.1.16 & 2021-11-18</li><li>3.1.16 & 2021-11-25</li><li>3.1.17 & 2021-11-25</li><li>3.1.17 & 2021-12-06</li><li>3.1.17 & 2022-01-05</li><li>3.1.18 & 2022-01-20</li><li>3.1.19 & 2022-01-20</li><li>3.1.20 & 2022-02-02</li><li>3.1.20 & 2022-02-28</li></ul> **Pangolin version & pangolin-data version** <ul><li>4.0 & 1.2.133</li><li>4.0.1 & 1.2.133</li><li>4.0.2 & 1.2.133</li><li>4.0.3 & 1.2.133</li><li>4.0.4 & 1.2.133</li><li>4.0.5 & 1.3</li><li>4.0.6 & 1.6</li><li>4.0.6 & 1.8</li><li>4.0.6 & 1.9</li><li>4.1.1 & 1.11</li><li>4.1.2 & 1.12</li><li>4.1.2 & 1.13</li><li>4.1.2 & 1.14</li><li>4.1.3 & 1.15.1</li><li>4.1.3 & 1.16</li><li>4.1.3 & 1.17</li><li>4.2 & 1.18</li><li>4.2 & 1.18.1</li><li>4.2 & 1.18.1.1</li><li>4.2 & 1.19</li></ul> </details> **Pangolin version & pangolin-data version** <ul><li>[4.3 & 1.20](pangolin/4.3-pdata-1.20/)</li><li>[4.3 & 1.21](pangolin/4.3-pdata-1.21/)</li><li>[4.3.1 & 1.22](pangolin/4.3.1-pdata-1.22/)</li><li>[4.3.1 & 1.23](pangolin/4.3.1-pdata-1.23/)</li><li>[4.3.1 & 1.23.1](pangolin/4.3.1-pdata-1.23.1/)</li><li>[4.3.1 & 1.23.1 with XDG_CACHE_HOME=/tmp](pangolin/4.3.1-pdata-1.23.1-1/)</li><li>[4.3.1 & 1.24](pangolin/4.3.1-pdata-1.24/)</li><li>[4.3.1 & 1.25.1](pangolin/4.3.1-pdata-1.25.1/)</li><li>[4.3.1 & 1.26](pangolin/4.3.1-pdata-1.26/)</li><li>[4.3.1 & 1.27](pangolin/4.3.1-pdata-1.27/)</li><li>[4.3.1 & 1.28](pangolin/4.3.1-pdata-1.28/)</li><li>[4.3.1 & 1.28.1](pangolin/4.3.1-pdata-1.28.1/)</li><li>[4.3.1 & 1.29](pangolin/4.3.1-pdata-1.29/)</li></ul> | https://github.com/cov-lineages/pangolin<br/>https://github.com/cov-lineages/pangoLEARN<br/>https://github.com/cov-lineages/pango-designation<br/>https://github.com/cov-lineages/scorpio<br/>https://github.com/cov-lineages/constellations<br/>https://github.com/cov-lineages/lineages (archived)<br/>https://github.com/hCoV-2019/pangolin (archived) |
| [panqc](https://hub.docker.com/r/staphb/panqc) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/panqc)](https://hub.docker.com/r/staphb/panqc) | <ul><li>[0.4.0](./panqc/0.4.0/)</li></ul> | https://github.com/maxgmarin/panqc/releases/tag/0.4.0 |
| [parallel-perl](https://hub.docker.com/r/staphb/parallel-perl) <br/> [![docker pulls](https://badgen.net/docker/pulls/staphb/parallel-perl)](https://hub.docker.com/r/staphb/parallel-perl) | <ul><li>20200722</li></ul> | https://www.gnu.org/software/parallel |
Expand Down
61 changes: 61 additions & 0 deletions pango_aliasor/0.3.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
FROM ubuntu:jammy as app

ARG PANGO_ALIASOR_VER="0.3.0"

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="Pango Aliasor"
LABEL software.version="${PANGO_ALIASOR_VER}"
LABEL description="Links sublineages to parent pangolin lineages"
LABEL website="https://github.com/corneliusroemer/pango_aliasor"
LABEL license="https://github.com/corneliusroemer/pango_aliasor/blob/main/LICENSE"
LABEL maintainer="Erin Young"
LABEL maintainer.email="[email protected]"

RUN apt-get update && apt-get install -y --no-install-recommends \
python3 \
python3-pip \
python-is-python3 \
wget \
procps && \
apt-get autoclean && rm -rf /var/lib/apt/lists/*

RUN wget -q https://github.com/corneliusroemer/pango_aliasor/archive/refs/tags/v${PANGO_ALIASOR_VER}.tar.gz && \
pip install v${PANGO_ALIASOR_VER}.tar.gz && \
rm v${PANGO_ALIASOR_VER}.tar.gz && \
pip install --no-cache pandas && \
mkdir /data

ENV PATH="$PATH" LC_ALL=C

COPY aliasor.py /usr/bin/.

WORKDIR /key

RUN wget -q https://raw.githubusercontent.com/cov-lineages/pango-designation/master/pango_designation/alias_key.json

WORKDIR /data

CMD [ "aliasor.py", "--help" ]

FROM staphb/pangolin:4.3.1-pdata-1.28 as pangolin

RUN apt-get update && apt-get install -y --no-install-recommends zstd

RUN wget -q https://github.com/corneliusroemer/pango-sequences/raw/main/data/pango-consensus-sequences_genome-nuc.fasta.zst && \
zstd -d pango-consensus-sequences_genome-nuc.fasta.zst && \
pangolin pango-consensus-sequences_genome-nuc.fasta

FROM app as test

WORKDIR /test

RUN aliasor.py --help

COPY --from=pangolin /data/lineage_report.csv .

RUN aliasor.py --input lineage_report.csv --output aliased_lineage_report_github.tsv && \
aliasor.py --input lineage_report.csv --output aliased_lineage_report.tsv --alias-key /key/alias_key.json && \
wc -l aliased_lineage_report_github.tsv aliased_lineage_report.tsv && \
head aliased_lineage_report_github.tsv aliased_lineage_report.tsv

78 changes: 78 additions & 0 deletions pango_aliasor/0.3.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@

# pango_aliasor container

Main tool: [pango_aliasor](https://github.com/corneliusroemer/pango_aliasor)

Code repository: https://github.com/corneliusroemer/pango_aliasor

Basic information on how to use this tool:
- executable: NA
- help: NA
- version: NA
- description: pango_aliasor is a python library for determining parent pangolin lineages

Additional information:
- Although not an official use by any means, `aliasor.py` is included in this image. This python script was written by [@erinyoung](https://github.com/erinyoung) for some quick use cases of finding parent lineages from pangolin results. Usage is below.
- A alias key is found at `/key/alias_key.json` in the containers spun from this image. When used, pango_aliasor does not download the latest key from github, which is useful for some cloud infrastructures.

Full documentation: [https://github.com/corneliusroemer/pango_aliasor](https://github.com/corneliusroemer/pango_aliasor)

## Example Usage

```python
import pandas as pd
from pango_aliasor.aliasor import Aliasor
import argparse


def add_unaliased_column(tsv_file_path, pango_column='pango_lineage', unaliased_column='pango_lineage_unaliased'):
aliasor = Aliasor()
def uncompress_lineage(lineage):
if not lineage or pd.isna(lineage):
return "?"
return aliasor.uncompress(lineage)

df = pd.read_csv(tsv_file_path, sep='\t')
df[unaliased_column] = df[pango_column].apply(uncompress_lineage)
return df


if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Add unaliased Pango lineage column to a TSV file.')
parser.add_argument('--input-tsv', required=True, help='Path to the input TSV file.')
parser.add_argument('--pango-column', default='pango_lineage', help='Name of the Pango lineage column in the input file.')
parser.add_argument('--unaliased-column', default='pango_lineage_unaliased', help='Name of the column to use for the unaliased Pango lineage column in output.')
args = parser.parse_args()
df = add_unaliased_column(args.input_tsv, args.pango_column, args.unaliased_column)
print(df.to_csv(sep='\t', index=False))
```

## Example Usage of aliasor.py

The help message
```bash
usage: aliasor.py [-h] --input INPUT [--output OUTPUT] [--pango-column PANGO_COLUMN] [--unaliased-column UNALIASED_COLUMN] [--alias-key ALIAS_KEY]

Add unaliased Pango lineage column to a TSV file.

options:
-h, --help show this help message and exit
--input INPUT Path to the input file (should end in tsv or csv for best results).
--output OUTPUT Name of tab-delimited output file
--pango-column PANGO_COLUMN
Name of the Pango lineage column in the input file.
--unaliased-column UNALIASED_COLUMN
Name of the column to use for the unaliased Pango lineage column in output.
--alias-key ALIAS_KEY
Alias Key as json file. If none provided, will download the latest version from github.
```
Examples for using aliasor.py with the lineage_report.csv file generated via pangolin (lineage_report.csv)
```bash
# downloading the latest alias key from github
aliasor.py --input lineage_report.csv --output unaliased_lineage_report.tsv

# using included alias key
aliasor.py --input lineage_report.csv --output unaliased_lineage_report.tsv --alias-key /key/alias_key.json
```
The unaliased column will be the last column in the output file.
45 changes: 45 additions & 0 deletions pango_aliasor/0.3.0/aliasor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/usr/bin/env python3

#####
# Mostly stolen from https://github.com/corneliusroemer/pango_aliasor?tab=readme-ov-file#convenience-script
# and https://github.com/UPHL-BioNGS/Wastewater-genomic-analysis/blob/pooja-dev/utils/freyja_custom_lin_processing.py
#####

import pandas as pd
from pango_aliasor.aliasor import Aliasor
import argparse

def add_unaliased_column(tsv_file_path, pango_column='lineage', unaliased_column='unaliased_lineage', alias_key = ''):
if alias_key:
aliasor = Aliasor(alias_key)
else:
aliasor = Aliasor()

def uncompress_lineage(lineage):
if not lineage or pd.isna(lineage):
return "?"
return aliasor.uncompress(lineage)

df = pd.DataFrame()

if tsv_file_path.endswith('.tsv'):
df = pd.read_csv(tsv_file_path, sep='\t')
elif tsv_file_path.endswith('.csv'):
df = pd.read_csv(tsv_file_path, sep=',')
else:
df = pd.read_csv(tsv_file_path, sep='\t')

df[unaliased_column] = df[pango_column].apply(uncompress_lineage)
return df

if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Add unaliased Pango lineage column to a TSV file.')
parser.add_argument('--input', required=True, help='Path to the input file (should end in tsv or csv for best results).')
parser.add_argument('--output', default='unaliased_lineage_report.tsv', help='Name of tab-delimited output file' )
parser.add_argument('--pango-column', default='lineage', help='Name of the Pango lineage column in the input file.')
parser.add_argument('--unaliased-column', default='unaliased_lineage', help='Name of the column to use for the unaliased Pango lineage column in output.')
parser.add_argument('--alias-key', default='', help="Alias Key as json file. If none provided, will download the latest version from github.")
args = parser.parse_args()

df = add_unaliased_column(args.input, args.pango_column, args.unaliased_column, args.alias_key)
df.to_csv(args.output, sep='\t', index=False)

0 comments on commit 37f0166

Please sign in to comment.