A cell-to-patient machine learning transfer approach uncovers novel basal-like breast cancer prognostic markers amongst alternative splice variants
All data to reproduce figures can be accessed here :
Two python(3) scripts are given separately for splicing and expression.
In directory data, you will find the input files.
They are based on the following version of scikit-learn (0.21.2.)
NB: Imputer warnings when script start is not an error.
They call one R script to plot survival over the rounds of classification.
python classification_cell2patient_splicing.py \
-c {absolutepath}/MatriceExonPSI_CellLines.csv \
-p {absolutepath}/MatriceExonPSI_Patients.csv \
-t 0.6 \
-n 1000 \
python classification_cell2patient_expression.py \
-c {absolutepath}/MatriceGeneTPM_CellLines.csv \
-p {absolutepath}/MatriceGeneTPM_Patients.csv \
-t 0.6 \
-n 1000
- t : Threshold for class probabilities.
- c : Path to a matrice with Expression/Splicing values for Cell Lines.
- p : Path to a matrice with Expression/Splicing values for Patients.
- n : Number of tree in the forest.
The final file annotated is splicing_TCGA_BASAL_HEADER_ADDED.tsv.
You can visualize using https://software.broadinstitute.org/morpheus.
The best features of interest are in outputBorutaPy.txt/.bed.