Skip to content

Commit

Permalink
Merge branch 'develop' into 'master'
Browse files Browse the repository at this point in the history
Release v1.0.0

See merge request tron/addannot!222
  • Loading branch information
Pablo Riesgo Ferreiro committed Sep 22, 2022
2 parents 4940583 + dc08d58 commit 5ea760c
Show file tree
Hide file tree
Showing 116 changed files with 5,943 additions and 3,375 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,6 @@ py36
/netMHCIIpan-3.2.Linux.tar.gz
/netMHCIIpan-4.0.Linux.tar.gz
/netMHCpan-4.1b.Linux.tar.gz
neofox.log
*.swp
./test_*
1 change: 0 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,3 @@ publish_package:
- TWINE_PASSWORD=${CI_JOB_TOKEN} TWINE_USERNAME=gitlab-ci-token python -m twine upload --repository-url https://gitlab.rlp.net/api/v4/projects/${CI_PROJECT_ID}/packages/pypi dist/*
only:
- develop
- master
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ RUN tar -xvf netMHCIIpan-4.0.Linux.tar.gz
RUN sed -i 's/\/net\/sund-nas.win.dtu.dk\/storage\/services\/www\/packages\/netMHCIIpan\/4.0\/netMHCIIpan-4.0/\/app\/netMHCIIpan-4.0/g' /app/netMHCIIpan-4.0/netMHCIIpan
RUN sed -i 's/ \/tmp\//\/app\/netMHCIIpan-4.0\/tmp/g' /app/netMHCIIpan-4.0/netMHCIIpan
RUN mkdir /app/netMHCIIpan-4.0/tmp
RUN wget http://www.cbs.dtu.dk/services/NetMHCIIpan-4.0/data.tar.gz -O /app/netMHCIIpan-4.0/data.tar.gz
RUN wget https://services.healthtech.dtu.dk/services/NetMHCIIpan-4.0/data.tar.gz -O /app/netMHCIIpan-4.0/data.tar.gz
RUN tar -xvf /app/netMHCIIpan-4.0/data.tar.gz -C /app/netMHCIIpan-4.0
ENV NEOFOX_NETMHC2PAN /app/netMHCIIpan-4.0/netMHCIIpan
RUN apt-get install tcsh
Expand All @@ -71,7 +71,7 @@ ENV NEOFOX_MIXMHC2PRED /app/MixMHC2pred-1.2/MixMHC2pred_unix
# install prime
RUN wget https://github.com/GfellerLab/PRIME/archive/master.tar.gz
RUN tar -xvf master.tar.gz
RUN sed -i 's/\/app\/PRIME/\/app\/PRIME-master/g' /app/PRIME-master/PRIME
RUN sed -i 's/PATH_TO_PRIME/\/app\/PRIME-master/g' /app/PRIME-master/PRIME
ENV NEOFOX_PRIME /app/PRIME-master/PRIME

# configure references
Expand Down
17 changes: 14 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

[![DOI](https://zenodo.org/badge/294667387.svg)](https://zenodo.org/badge/latestdoi/294667387)
[![PyPI version](https://badge.fury.io/py/neofox.svg)](https://badge.fury.io/py/neofox)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/neofox/badges/version.svg)](https://anaconda.org/bioconda/neofox)
[![Documentation Status](https://readthedocs.org/projects/neofox/badge/?version=latest)](https://neofox.readthedocs.io/en/latest/?badge=latest)


Expand Down Expand Up @@ -51,7 +52,7 @@ NeoFox covers the following neoantigen features and prediction algorithms:

NeoFox depends on the following tools:

- Python >=3.6, <=3.8
- Python >=3.7, <=3.8
- R 3.6.0
- BLAST 2.10.1
- netMHCpan 4.1
Expand All @@ -60,13 +61,23 @@ NeoFox depends on the following tools:
- MixMHC2pred 1.2
- PRIME 1.0

Install from PyPI:
```
pip install neofox
```

Or install from bioconda:
```
conda install bioconda::neofox
```


## 3 Usage from the command line

NeoFox can be used from the command line as shown below or programmatically (see [https://neofox.readthedocs.io](https://neofox.readthedocs.io/) for more information).

````commandline
neofox --candidate-file/--json-file neoantigens_candidates.tab/neoantigens_candidates.json --patient-data/--patient-data-json patient_data.txt/patient_data.json --output-folder /path/to/out --output-prefix out_prefix [--patient-id] [--with-table] [--with-json] [--num_cpus] [--affinity-threshold]
neofox --candidate-file/--json-file neoantigens_candidates.tab/neoantigens_candidates.json --patient-data/--patient-data-json patient_data.txt/patient_data.json --output-folder /path/to/out --output-prefix out_prefix [--patient-id] [--with-table] [--with-json] [--num-cpus] [--affinity-threshold]
````
- `--candidate-file`: tab-separated values table with neoantigen candidates represented by long mutated peptide sequences as described [here](#41-neoantigen-candidates-in-tabular-format)
- `--json-file`: JSON file neoantigens in NeoFox model format as described [here](#42-neoantigen-candidates-in-json-format)
Expand All @@ -76,7 +87,7 @@ neofox --candidate-file/--json-file neoantigens_candidates.tab/neoantigens_candi
- `--output-prefix`: prefix for the output files (*optional*)
- `--with-table`: output file in tab-separated format (*default*)
- `--with-json`: output file in JSON format (*optional*)
- `--num_cpus`: number of CPUs to use (*optional*)
- `--num-cpus`: number of CPUs to use (*optional*)
- `--config`: a config file with the paths to dependencies as shown below (*optional*)
- `--organism`: the organism to which the data corresponds. Possible values: [human, mouse]. Default value: human
- `--affinity-threshold`: a affinity value (*optional*) neoantigen candidates with a best predicted affinity greater than or equal than this threshold will be not annotated with features that specifically model
Expand Down
Binary file modified docs/resources/column_description.xlsx
Binary file not shown.
3 changes: 2 additions & 1 deletion docs/source/01_overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Welcome to the documentation of **NeoFox**!

[![DOI](https://zenodo.org/badge/294667387.svg)](https://zenodo.org/badge/latestdoi/294667387)
[![PyPI version](https://badge.fury.io/py/neofox.svg)](https://badge.fury.io/py/neofox)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/neofox/badges/version.svg)](https://anaconda.org/bioconda/neofox)

## About NeoFox

Expand All @@ -17,7 +18,7 @@ candidate to be a true neoantigen are required.
Several neoantigen features that describe the ability of a neoantigen candidate to induce a T-cell response have been published
in the last years.

**NeoFox** (**NEO**antigen **F**eature toolb**OX**) is a python package that annotates a given set of neoantigen candidate sequences with relevant neoantigen features.
**NeoFox** (**NEO**antigen **F**eature toolb**OX**) is a python package that annotates a given set of neoantigen candidate sequences with relevant neoantigen features. The annotation of neoepitope candidates is supported from NeoFox version 1.0.0.
NeoFox supports annotation of neoantigen candidates derived from SNVs (single nucleotide variant) and alternative mutation classes such as INDELs or fusion genes. Furthermore, NeoFox supports both human and mouse derived neoantigen candidates.

NeoFox covers neoepitope prediction by MHC binding and ligand prediction, similarity/foreignness of a neoepitope candidate sequence, combinatorial features and machine learning approaches.
Expand Down
70 changes: 59 additions & 11 deletions docs/source/02_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ the sites indicated below.

Store these in the root folder of the repository, next to the `Dockerfile`. Do not rename the installer files.

Build the docker image: `docker build --tag neofox-docker .`
Build the docker image: `docker build --platform linux/amd64 --tag neofox-docker .`

Run NeoFox: `docker run neofox-docker neofox --help`

Expand All @@ -33,30 +33,54 @@ See the usage guide [here](03_03_usage.md) for further details.

These installation instructions were tested on Ubuntu 18.04.

Python >=3.7, <=3.8 and R 3.6.0 should be preinstalled.
Python 3.7 or 3.8 should be preinstalled.

Set the environment variable pointing to `Rscript`.
The libz compression development library is required. This can be installed in Ubuntu as follows:
```
export NEOFOX_RSCRIPT=`which Rscript`
apt-get install libz-dev
```

### Install NeoFox

Install from PyPI:
```
pip install neofox
```

or install from bioconda:
```
conda install bioconda::neofox
```

### Install third-party dependencies


#### Install R

R 3.6.0 is required.

Optionally set the environment variable pointing to `Rscript`, otherwise neofox will look for it in the path.
```
export NEOFOX_RSCRIPT=`which Rscript`
```

**NOTE**: when installing from conda this dependency is already installed.

#### Install BLASTP

The version of BLASTP that was tested is 2.10.1, other versions may work but that is untested.
```
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.10.1/ncbi-blast-2.10.1+-x64-linux.tar.gz
tar -xvf ncbi-blast-2.10.1+-x64-linux.tar.gz
export NEOFOX_BLASTP=`pwd`/ncbi-blast-2.10.1+/bin/blastp
```

Optionally set the environment variable pointing to `blastp`, otherwise neofox will look for it in the path.
```
export NEOFOX_BLASTP=/path/to/ncbi-blast-2.10.1+/bin/blastp
```

**NOTE**: when installing from conda this dependency is already installed.

#### Install NetMHCpan-4.1

NetMHCpan-4.1 can be downloaded by academic users from https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.1
Expand All @@ -66,8 +90,11 @@ tar -xvf netMHCpan-4.1b.Linux.tar.gz
cd netMHCpan-4.1
wget https://services.healthtech.dtu.dk/services/NetMHCpan-4.1/data.tar.gz
tar -xvf data.tar.gz
cd ..
export NEOFOX_NETMHCPAN=`pwd`/netMHCpan-4.1/netMHCpan
```

Optionally set the environment variable pointing to `netMHCpan`, otherwise neofox will look for it in the path.
```
export NEOFOX_NETMHCPAN=/path/to/netMHCpan-4.1/netMHCpan
```

Configure NetMHCpan as explained in the file `netMHCpan-4.1/netMHCpan-4.1.readme`
Expand All @@ -83,12 +110,15 @@ cd netMHCIIpan-4.0
# download the data
wget http://www.cbs.dtu.dk/services/NetMHCIIpan-4.0/data.Linux.tar.gz
tar -xvf data.Linux.tar.gz
cd ..
export NEOFOX_NETMHC2PAN=`pwd`/netMHCIIpan-4.0/netMHCIIpan
# install tcsh shell interpreter if not available yet
sudo apt-get install tcsh
```

Optionally set the environment variable pointing to `netMHCIIpan`, otherwise neofox will look for it in the path.
```
export NEOFOX_NETMHC2PAN=/path/to/netMHCIIpan-4.0/netMHCIIpan
```

Configure NetMHCIIpan-4.0 as explained in the file `netMHCIIpan-4.0/netMHCIIpan-4.0.readme`


Expand All @@ -97,7 +127,12 @@ Configure NetMHCIIpan-4.0 as explained in the file `netMHCIIpan-4.0/netMHCIIpan-
```
wget https://github.com/GfellerLab/MixMHCpred/archive/v2.1.tar.gz
tar -xvf v2.1.tar.gz
export NEOFOX_MIXMHCPRED=`pwd`/MixMHCpred-2.1/MixMHCpred
```

Set the environment variable pointing to `MixMHCpred`, there will be no search in the path as the installation folder
is also needed to determine the supported alleles.
```
export NEOFOX_MIXMHCPRED=/path/to/MixMHCpred-2.1/MixMHCpred
```

Configure MixMHCpred-2.1 as explained in the file `MixMHCpred-2.1/README`
Expand All @@ -107,6 +142,11 @@ Configure MixMHCpred-2.1 as explained in the file `MixMHCpred-2.1/README`
```
wget https://github.com/GfellerLab/MixMHC2pred/archive/v1.2.tar.gz
tar -xvf v1.2.tar.gz
```

Set the environment variable pointing to `MixMHC2pred_unix`, there will be no search in the path as the installation
folder is also needed to determine the supported alleles.
```
export NEOFOX_MIXMHC2PRED=`pwd`/MixMHC2pred-1.2/MixMHC2pred_unix
```

Expand All @@ -115,14 +155,20 @@ export NEOFOX_MIXMHC2PRED=`pwd`/MixMHC2pred-1.2/MixMHC2pred_unix
```
wget https://github.com/GfellerLab/PRIME/archive/master.tar.gz
tar -xvf master.tar.gz
```

Set the environment variable pointing to `PRIME`, there will be no search in the path as the installation folder
is also needed to determine the supported alleles.
```
export NEOFOX_PRIME==`pwd`/PRIME-master/PRIME
```

Configure PRIME as explained in the file `PRIME-master/README`

### Configuration of the reference folder

To configure the reference folder, set the environment variable for `makeblastdb`, NetMHCpan, NetMHCIIpan and Rscript:
To configure the reference folder, set the environment variables for `makeblastdb`, NetMHCpan, NetMHCIIpan and Rscript,
or alternatively rely on these being fetched from the path:

```
export NEOFOX_MAKEBLASTDB=`pwd`/ncbi-blast-2.10.1+/bin/makeblastdb
Expand All @@ -143,6 +189,8 @@ Run the following to configure the NeoFox reference folder:
neofox-configure --reference-folder /your/neofox/folder [--install-r-dependencies]
```

**NOTE**: when installing from conda `--install-r-dependencies` is not needed.

The above command will install several resources and store in the annotations metadata their version, MD5 checksum and
download timestamp.

Expand Down
66 changes: 55 additions & 11 deletions docs/source/03_01_input_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,23 @@

## General information

NeoFox requires two input files: a file with neoantigen candidates and a file with patient data.
NeoFox requires two input files: a candidate file with neoantigen or neoepitope candidates and a file with patient data.
The file with neoantigen candidates can be provided either in tabular format or in JSON format and this file may contain
additional user-specific input that will be kept during the annotation process. The patient file requires a tabular format.

## File with neoantigen candidates
Alternatively, NeoFox may annotate a set of neoepitope candidates for which it will require a file with neoepitope
candidates and optionally a file with patient data. Both files are required in tabular format.

#### Tabular file format
## Candidate file

### Tabular file format

#### Neoantigen candidates

This is an dummy example of a table with neoantigen candidates in tabular format:

| gene | mutation.wildTypeXmer | mutation.mutatedXmer | patientIdentifier | rnaExpression | rnaVariantAlleleFrequency | dnaVariantAlleleFrequency | external_annotation_1 | external_annotation_2 |
|-------|-----------------------------|-----------------------------|-------------------|---------------|---------------------------|---------------------------|-----------------------|-----------------------|
| gene | wildTypeXmer | mutatedXmer | patientIdentifier | rnaExpression | rnaVariantAlleleFrequency | dnaVariantAlleleFrequency | external_annotation_1 | external_annotation_2 |
|-------|-----------------------------|----------------------------|-------------------|---------------|---------------------------|---------------------------|-----------------------|-----------------------|
| BRCA2 | AAAAAAAAAAAAALAAAAAAAAAAAAA | AAAAAAAAAAAAAFAAAAAAAAAAAAA | Ptx | 7.942 | 0.85 | 0.34 | some_value | some_value |
| BRCA2 | AAAAAAAAAAAAAMAAAAAAAAAAAAA | AAAAAAAAAAAAARAAAAAAAAAAAAA | Ptx | 7.942 | 0.85 | 0.34 | some_value | some_value |
| BRCA2 | AAAAAAAAAAAAAGAAAAAAAAAAAAA | AAAAAAAAAAAAAKAAAAAAAAAAAAA | Ptx | 7.942 | 0.85 | 0.34 | some_value | some_value |
Expand All @@ -22,8 +27,8 @@ This is an dummy example of a table with neoantigen candidates in tabular format

where:
- `gene`: the HGNC gene symbol. (This field is not required for neoantigen candidates derived from other sources than SNVs)
- `mutation.mutatedXmer`: the neoantigen candidate sequence, i.e. the mutated amino acid sequence. In case of SNVs, the mutation should be located in the middle. We advise that the point mutation is flanked by 13 amino acid on both sites (IUPAC 1 respecting casing, eg: A) to cover both MHC I and MHC II neopeptides
- `mutation.wildTypeXmer`: the equivalent non-mutated amino acid sequence (IUPAC 1 respecting casing, eg: A). This field shall be empty, specially in the case of neoantigen candidates derived from other sources than SNVs.
- `mutatedXmer`: the neoantigen candidate sequence, i.e. the mutated amino acid sequence. In case of SNVs, the mutation should be located in the middle. We advise that the point mutation is flanked by 13 amino acid on both sites (IUPAC 1 respecting casing, eg: A) to cover both MHC I and MHC II neopeptides
- `wildTypeXmer`: the equivalent non-mutated amino acid sequence (IUPAC 1 respecting casing, eg: A). This field shall be empty, specially in the case of neoantigen candidates derived from other sources than SNVs.
- `patientIdentifier`: the patient identifier
- `rnaExpression`: RNA expression. (**optional**) (see *NOTE*) This value can be in any format chosen by the user (e.g. TPM, RPKM) but it is recommended to be consistent for data that should be compared.
- `rnaVariantAlleleFrequency`: the variant allele frequency (VAF) calculated from the RNA (**optional**)
Expand All @@ -35,21 +40,57 @@ where:
- If `dnaVariantAlleleFrequency` is given while `rnaVariantAlleleFrequency` is not given, the VAF in RNA will be estimated by the VAF in DNA.
This means that feature scores that rely on the VAF in RNA will be calulated with the VAF in DNA.

#### Neoepitope candidates

This is an dummy example of a table with neoepitope candidates in tabular format:

| gene | mutatedPeptide | wildTypePeptide | alleleMhcI | isoformMhcII | patientIdentifier | rnaExpression | rnaVariantAlleleFrequency | dnaVariantAlleleFrequency |
|-------|---------------------|-----------------------------|-------------|--------------|-------------------|---------------------------|---------------------------|---------------------------|
| BRCA2 | AAAALAAAAA | AAAAFAAAAA | HLA-A*01:01 | | Ptx | 7.942 | 0.85 | 0.34 |
| BRCA2 | AAAAAAAAAAAAAMAAAAAAAAAAAAA | AAAAAAAAAAAAARAAAAAAAAAAAAA | | DRB1*01:01 | Ptx | 7.942 | 0.85 | 0.34 |
| BRCA2 | AAAAGAAAAA | AAAAKAAAAA | | | Ptx | 7.942 | 0.85 | 0.34 |
| BRCA2 | AAAAAAAAAAAAACAAAAAAAAAAAAA | AAAAAAAAAAAAAEAAAAAAAAAAAAA | | | Ptx | 7.942 | 0.85 | 0.34 |
| BRCA2 | AAAAAAAAAAAAAKAAAAAAAAAAAAA | AAAAAAAAAAAAACAAAAAAAAAAAAA | | | Ptx | 7.942 | 0.85 | 0.34 |

where:
- `mutatedPeptide`: the neoepitope candidate sequence, i.e. the mutated amino acid sequence. MHC-I neoepitopes should have a length between 8 and 14 amino acids, MHC-II neoepitopes should have a length between 9 and 20000 amino acids.
- `wildTypePeptide`: the equivalent non-mutated amino acid sequence (IUPAC 1 respecting casing, eg: A). This field shall be empty, specially in the case of neoepitope candidates derived from other sources than SNVs.
- `alleleMhcI`: the MHC-I allele to which this neoepitope is linked (**optional**)
- `isoformMhcII`: the MHC-II isoform to which this neoepitope is linked (**optional**)
- `patientIdentifier`: the patient identifier (**only required if alleleMhcI and isoformMhcII are not provided**)
- `gene`: the HGNC gene symbol. (This field is optional)
- `rnaExpression`: RNA expression. (**optional**) (see *NOTE*) This value can be in any format chosen by the user (e.g. TPM, RPKM) but it is recommended to be consistent for data that should be compared.
- `rnaVariantAlleleFrequency`: the variant allele frequency (VAF) calculated from the RNA (**optional**)
- `dnaVariantAlleleFrequency`: the VAF calculated from the DNA. (**optional**)

**NOTE:**

- Neoepitopes with a value for `alleleMhcI` are considered MHC-I neoepitopes, likewise neoepitopes with a value for `isoformMhcII` are considered MHC-II neoepitopes. Both fields cannot be provided for the same neoepitope.
- If none of `alleleMhcI` and `isoformMhcII` are provided then the `patientIdentifier` is required and one neoepitope sharing the same sequence will be annotated for each MHC-I allele and MHC-II isoform according to the patient HLA type.
- If rnaExpression is not provided and the tumor type is given in the patient data, expression will be estimated by gene expression in TCGA cohort indicated in the `tumorType` in the patient data (see below). Please, not that this does not work for mouse data. Here, expression imputation is currently not supported.
- If `dnaVariantAlleleFrequency` is given while `rnaVariantAlleleFrequency` is not given, the VAF in RNA will be estimated by the VAF in DNA.
This means that feature scores that rely on the VAF in RNA will be calulated with the VAF in DNA.


### JSON file format

#### Neoantigen candidates

Besides tabular format, neoantigen candidates can be provided as a list of neoantigen models in JSON format as shown below. To simplify, only one full neoantigen model is shown. The terminology follows the descriptions for the [tabular file format](#tabular-file-format). For a more detailed description of the models, please refer to [here](05_models.md):

```json
[{
"patientIdentifier": "Ptx",
"gene": "BRCA2",
"mutation": {
"wildTypeXmer": "AAAAAAAAAAAAALAAAAAAAAAAAAA",
"mutatedXmer": "AAAAAAAAAAAAAFAAAAAAAAAAAAA"
}
"wildTypeXmer": "AAAAAAAAAAAAALAAAAAAAAAAAAA",
"mutatedXmer": "AAAAAAAAAAAAAFAAAAAAAAAAAAA"
}]
```

#### Neoepitope candidates

Not supported at the moment.

## File with patient data

### Human
Expand Down Expand Up @@ -125,3 +166,6 @@ A given allele is represented by a last small case single letter (eg: d, k, p) w

These are examples of H-2 alleles: H2Kd, H2Dd, H2Lp




Loading

0 comments on commit 5ea760c

Please sign in to comment.