Merge branch 'develop' into 'master'

Release INPuT v0.2.2 See merge request tron/addannot!29
TRON-Bioinformatics · Jul 8, 2020 · ecab0b0 · ecab0b0
2 parents 7ceb1d3 + 103625e
commit ecab0b0
Show file tree

Hide file tree

Showing 228 changed files with 5,381 additions and 214,699 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,17 @@
+# Basic .gitattributes for a python repo.
+
+# Source files
+# ============
+*.py     text diff=python
+*.py3    text diff=python
+*.avdl   text diff
+*.gitignore diff
+*.gitattributes diff
+MANIFEST.in diff
+README.md diff
+
+# Binary files
+# ============
+**/*.xml     binary
+**/*.pdf      binary
+**/*.fasta    binary
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,2 +1,3 @@
 include input/neoag/neoag-master/*
-include input/Tcell_predictor/Classifier.pickle
+include input/Tcell_predictor/Classifier.pickle
+include input/self_similarity/BLOSUM62-2.matrix.txt
diff --git a/README.md b/README.md
@@ -29,46 +29,22 @@ Annotation of mutated peptide sequences (mps) with published or novel potential
 - Multiplexed Representation  
 
 
-## **Requirements**
+## Input Requirements
 
-**Specific Input:**  
-- icam_output.txt --> icam output file; either patient-specific, or several patients combineds
-- allele.csv --> ";" separated file with mhc I and mhc II alleles (4 digits!) for all patients of a cohort. If a gene is homozygous, give the allele twice
-  e.g.  
-```
-Pt1;mhc_I_selection;HLA-A*03:01;HLA-A*11:01;HLA-B*55:01;HLA-B*51:01;HLA-C*01:02;HLA-C*03:03;
-Pt1;mhc_II_selection;HLA-DRB1*13:01;HLA-DRB1*11:01;HLA-DQA1*01:03;HLA-DQA1*05:05;HLA-DQB1*06:03;HLA-DQB1*03:01;HLA-DPA1*01:03;HLA-DPB1*02:01;HLA-DPB1*04:02;
-Pt2;mhc_I_selection;HLA-A*02:01;HLA-A*26:01;HLA-B*27:05;HLA-B*57:01;HLA-C*01:85;HLA-C*06:02;
-Pt2;mhc_II_selection;HLA-DRB1*01:01;HLA-DRB1*07:01;HLA-DQA1*01:01;HLA-DQA1*02:01;HLA-DQB1*05:01;HLA-DQB1*03:03;HLA-DPA1*01:03;HLA-DPB1*02:01;HLA-DPB1*04:02;
+**Specific Input:**
+- icam_output.txt --> icam output file
+- patient identifier --> the patient identifier to whom all neoantigens in icam output belong
+- patient data --> a table of tab separated values containing metadata on the patient
+  - required fields: identifier, mhcIAlleles, mhcIIAlleles
+  - optional fields: estimatedTumorContent, isRnaAvailable, tissue
 
-```  
-- *OPTIONAL!!*:";" separated file with tumor content (e.g. patient_overview file for each cohort)
+**Example of patient data table**
 ```
-Patient;est. Tumor content;number of mutations; number of SNVs;number of Indels;unique_peptides;number_of_expressed_ge
-Pt10/;62.0;463;437;26;180;16200
-Pt11/;;;;;;
-Pt12/;66.0;120;104;16;38;15147
-Pt13/;49.0;863;843;20;327;15707
-Pt14/;55.0;2375;2336;39;909;16107
-Pt15/;50.0;1227;1174;53;433;15029
-Pt16/;24.0;948;940;8;368;15562
-Pt17/;;;;;;
+identifier  mhcIAlleles mhcIIAlleles    estimatedTumorContent   isRnaAvailable  tissue
+Pt29    HLA-A*03:01,HLA-A*02:01,HLA-B*07:02 HLA-DRB1*11:04,HLA-DRB1*15:01   69  True    skin
 ```
 
-
-
 **Required Columns of iCaM Table:**  
--   MHC_I_epitope_.best_prediction.  
-- 	MHC_I_epitope_.WT.  
--   MHC_II_epitope_.best_prediction.  
-- 	MHC_II_epitope_.WT.  
-- 	MHC_I_score_.best_prediction.  
-- 	MHC_I_score_.WT.  
-- 	MHC_II_score_.best_prediction.  
-- 	MHC_II_score_.WT.  
-- 	MHC_I_peptide_length_.best_prediction.
-- 	MHC_I_allele_.best_prediction.  
-- 	MHC_II_allele_.best_prediction.  
 - 	transcript_expression  
 - 	VAF_in_RNA  
 - 	VAF_in_tumor  
@@ -102,16 +78,61 @@ Pt17/;;;;;;
 
 ## **Usage**  
 
-**Single iCaM File**  
 ```
-python predict_all_epitopes.py --icam_file testseq_head.txt  --allele_file alleles.csv [--tissue skin --frameshift False --tumour_content file_with_tumor_content]> test07.txt
-```  
+input --icam-file testseq_head.txt --patient-id Pt123 --patient-data patients.csv [--frameshift False]
+```
+
+
+## Developer guide
+
+### Build the package
+
+To build the package just run:
+```
+python setup.py bdist_wheel
+```
+
+This will create an installable wheel file under `dist/input-x.y.z.whl`.
+
+### Install the package
+
+Install the wheel file as follows:
+```
+pip install dist/input-x.y.z.whl
+```
+
+### Run integration tests
+
+To run the integration tests make sure you have a file `.env` that contains the following variables with the right values:
+```
+export INPUT_REFERENCE_FOLDER=~/addannot_references
+export INPUT_BLASTP=/code/ncbi-blast/2.8.1+/bin/blastp
+export INPUT_MIXMHC2PRED=/code/net/MixMHC2pred/1.1/MixMHC2pred
+export INPUT_MIXMHCPRED=/code/MixMHCpred/2.0.2/MixMHCpred
+export INPUT_RSCRIPT=/code/R/3.6.0/bin/Rscript
+export INPUT_NETMHC2PAN=/code/net/MHCIIpan/3.2/netMHCIIpan
+export INPUT_NETMHCPAN=/code/net/MHCpan/4.0/netMHCpan
+```
+
+The folder `$INPUT_REFERENCE_FOLDER` requires to contain the resources defined above.
+
+Run the integration tests as follows:
+```
+python -m unittest discover input.tests.integration_tests
+```
 
---> annotation of one iCaM file
+The integration tests run over some real datasets and they take some time to run.
 
-**Multiple iCaM Files**  
+The integration test that runs the whle program over a relevant dataset can be run as follows:
 ```
-sh start_annotation_multiple_patientfiles.sh cohort_folder_with_patient_icam_folders output_folder allele_table cohort_name
-```  
+python -m unittest input.tests.integration_tests.test_input
+```
+
+### Run unit tests
 
---> eg. parallel mps annotation of patients of a cohort, iCaM files stored in cohort_folder_with_patient_icam_folders
+The unit tests do not have any dependency and they finish in seconds.
+
+Run the unit tests as follows:
+```
+python -m unittest discover input.tests.unit_tests
+```