Skip to content

Commit

Permalink
Merge branch 'develop' into 'master'
Browse files Browse the repository at this point in the history
Release INPuT v0.2.2

See merge request tron/addannot!29
  • Loading branch information
franla23 committed Jul 8, 2020
2 parents 7ceb1d3 + 103625e commit ecab0b0
Show file tree
Hide file tree
Showing 228 changed files with 5,381 additions and 214,699 deletions.
17 changes: 17 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Basic .gitattributes for a python repo.

# Source files
# ============
*.py text diff=python
*.py3 text diff=python
*.avdl text diff
*.gitignore diff
*.gitattributes diff
MANIFEST.in diff
README.md diff

# Binary files
# ============
**/*.xml binary
**/*.pdf binary
**/*.fasta binary
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include input/neoag/neoag-master/*
include input/Tcell_predictor/Classifier.pickle
include input/Tcell_predictor/Classifier.pickle
include input/self_similarity/BLOSUM62-2.matrix.txt
105 changes: 63 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,46 +29,22 @@ Annotation of mutated peptide sequences (mps) with published or novel potential
- Multiplexed Representation


## **Requirements**
## Input Requirements

**Specific Input:**
- icam_output.txt --> icam output file; either patient-specific, or several patients combineds
- allele.csv --> ";" separated file with mhc I and mhc II alleles (4 digits!) for all patients of a cohort. If a gene is homozygous, give the allele twice
e.g.
```
Pt1;mhc_I_selection;HLA-A*03:01;HLA-A*11:01;HLA-B*55:01;HLA-B*51:01;HLA-C*01:02;HLA-C*03:03;
Pt1;mhc_II_selection;HLA-DRB1*13:01;HLA-DRB1*11:01;HLA-DQA1*01:03;HLA-DQA1*05:05;HLA-DQB1*06:03;HLA-DQB1*03:01;HLA-DPA1*01:03;HLA-DPB1*02:01;HLA-DPB1*04:02;
Pt2;mhc_I_selection;HLA-A*02:01;HLA-A*26:01;HLA-B*27:05;HLA-B*57:01;HLA-C*01:85;HLA-C*06:02;
Pt2;mhc_II_selection;HLA-DRB1*01:01;HLA-DRB1*07:01;HLA-DQA1*01:01;HLA-DQA1*02:01;HLA-DQB1*05:01;HLA-DQB1*03:03;HLA-DPA1*01:03;HLA-DPB1*02:01;HLA-DPB1*04:02;
**Specific Input:**
- icam_output.txt --> icam output file
- patient identifier --> the patient identifier to whom all neoantigens in icam output belong
- patient data --> a table of tab separated values containing metadata on the patient
- required fields: identifier, mhcIAlleles, mhcIIAlleles
- optional fields: estimatedTumorContent, isRnaAvailable, tissue

```
- *OPTIONAL!!*:";" separated file with tumor content (e.g. patient_overview file for each cohort)
**Example of patient data table**
```
Patient;est. Tumor content;number of mutations; number of SNVs;number of Indels;unique_peptides;number_of_expressed_ge
Pt10/;62.0;463;437;26;180;16200
Pt11/;;;;;;
Pt12/;66.0;120;104;16;38;15147
Pt13/;49.0;863;843;20;327;15707
Pt14/;55.0;2375;2336;39;909;16107
Pt15/;50.0;1227;1174;53;433;15029
Pt16/;24.0;948;940;8;368;15562
Pt17/;;;;;;
identifier mhcIAlleles mhcIIAlleles estimatedTumorContent isRnaAvailable tissue
Pt29 HLA-A*03:01,HLA-A*02:01,HLA-B*07:02 HLA-DRB1*11:04,HLA-DRB1*15:01 69 True skin
```



**Required Columns of iCaM Table:**
- MHC_I_epitope_.best_prediction.
- MHC_I_epitope_.WT.
- MHC_II_epitope_.best_prediction.
- MHC_II_epitope_.WT.
- MHC_I_score_.best_prediction.
- MHC_I_score_.WT.
- MHC_II_score_.best_prediction.
- MHC_II_score_.WT.
- MHC_I_peptide_length_.best_prediction.
- MHC_I_allele_.best_prediction.
- MHC_II_allele_.best_prediction.
- transcript_expression
- VAF_in_RNA
- VAF_in_tumor
Expand Down Expand Up @@ -102,16 +78,61 @@ Pt17/;;;;;;

## **Usage**

**Single iCaM File**
```
python predict_all_epitopes.py --icam_file testseq_head.txt --allele_file alleles.csv [--tissue skin --frameshift False --tumour_content file_with_tumor_content]> test07.txt
```
input --icam-file testseq_head.txt --patient-id Pt123 --patient-data patients.csv [--frameshift False]
```


## Developer guide

### Build the package

To build the package just run:
```
python setup.py bdist_wheel
```

This will create an installable wheel file under `dist/input-x.y.z.whl`.

### Install the package

Install the wheel file as follows:
```
pip install dist/input-x.y.z.whl
```

### Run integration tests

To run the integration tests make sure you have a file `.env` that contains the following variables with the right values:
```
export INPUT_REFERENCE_FOLDER=~/addannot_references
export INPUT_BLASTP=/code/ncbi-blast/2.8.1+/bin/blastp
export INPUT_MIXMHC2PRED=/code/net/MixMHC2pred/1.1/MixMHC2pred
export INPUT_MIXMHCPRED=/code/MixMHCpred/2.0.2/MixMHCpred
export INPUT_RSCRIPT=/code/R/3.6.0/bin/Rscript
export INPUT_NETMHC2PAN=/code/net/MHCIIpan/3.2/netMHCIIpan
export INPUT_NETMHCPAN=/code/net/MHCpan/4.0/netMHCpan
```

The folder `$INPUT_REFERENCE_FOLDER` requires to contain the resources defined above.

Run the integration tests as follows:
```
python -m unittest discover input.tests.integration_tests
```

--> annotation of one iCaM file
The integration tests run over some real datasets and they take some time to run.

**Multiple iCaM Files**
The integration test that runs the whle program over a relevant dataset can be run as follows:
```
sh start_annotation_multiple_patientfiles.sh cohort_folder_with_patient_icam_folders output_folder allele_table cohort_name
```
python -m unittest input.tests.integration_tests.test_input
```

### Run unit tests

--> eg. parallel mps annotation of patients of a cohort, iCaM files stored in cohort_folder_with_patient_icam_folders
The unit tests do not have any dependency and they finish in seconds.

Run the unit tests as follows:
```
python -m unittest discover input.tests.unit_tests
```
Loading

0 comments on commit ecab0b0

Please sign in to comment.