Machine learning-based approach for the inference of gene regulatory networks from expression data.
The GENIE3 method is described in the following paper (available here):
Huynh-Thu V. A., Irrthum A., Wehenkel L., and Geurts P.
Inferring regulatory networks from expression data using tree-based methods.
PLoS ONE, 5(9):e12776, 2019.
Four implementations of GENIE3 are available: Python, MATLAB, R/randomForest, and R/C. Each folder contains a PDF file with a step-by-step tutorial showing how to run the code.
Note 1: The R/C implementation can also be installed from Bioconductor.
Note 2: All the results presented in the PLoS ONE paper were generated using the MATLAB implementation.
GENIE3 is based on regression trees. To learn these trees, the Python implementation uses the scikit-learn library, the MATLAB and R/C implementations are respectively MATLAB and R wrappers of a C code written by Pierre Geurts, and the R/randomForest implementation uses the randomForest R package. The R/C implementation is the fastest GENIE3 implementation, and was developed for the SCENIC pipeline to analyze single-cell RNA-seq data (Aibar Santos et al., Nature Methods, 14:1083-1086, 2017.). The running times of the different GENIE3 implementations are shown below for the DREAM5 networks (in each case, GENIE3 was run using the default parameters). These computing times were measured on a 16GB RAM, Intel Xeon E5520 2.27GHz computer.
GENIE3 was the best performer in two DREAM challenges: the DREAM4 In Silico Size 100 Multifactorial sub-challenge and the DREAM5 Network Inference challenge.