-
Notifications
You must be signed in to change notification settings - Fork 0
/
README_RF_NN.txt
44 lines (26 loc) · 2.17 KB
/
README_RF_NN.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
README for the other two supervised approaches, random forest and neural network, not being chosen as the predictor of choice.
## Predicting Bio-Activity
In order to make a structure based predict on the bio-activity of molecules a list of features is generated with a KNIME workflow. This list is used as input for either a Neural Network or a Random Forest Predictor. In both scripts the input data is splitted into training and test data, 70% of the data is used to train the predictor. Furthermore, the parameters of the predictors are adjusted by GridSearchCV:
The predictor is trained multiple times with different combinations of available parameters and the best predictor is then used to predict the bio-activity.
## Feature Calculation
The KNIME workflow featureGeneration.knar receives an input file containing SMILES and the predicted bio-activity of the molecule in a comma separated csv file. It generates a list of features for the molecules and outputs a comma separated file containing the activity, the SMILES structure the molecules corresponding features.
## Classification
In order to run the program one has to specify
-t Path of the input csv file generated by the KNIME workflow
-o Destination path of the resulting prediction csv
## Random Forest Classifier
randomForest_GridSearch.py -t trainingData_Features.csv -o rfc_GridSearch_res.csv
## Neural Network Classifier
neuronalNetwork_GridSearch.py -t trainingData_Features.csv -o rfc_GridSearch_res.csv
## Built With
* [KNIME](https://www.knime.com/) - Analytics Platform (3.7)
* [RDKIT](http://rdkit.org/docs/Install.html) - Software Package to read and analyse SMILE data (3.4.0v)
* [Python](https://www.python.org/downloads/release/python-360/) - Python programming language (3.6)
* [scikit-learn](https://scikit-learn.org) - Software Package for Machine Learning (v0.20.1)
* [keras](https://keras.io/) - Open Source Deep Learning Library (2.24)
* [matplotlib](https://matplotlib.org/) - 2D Plotting Library (2.2.2)
* [pandas](https://pandas.pydata.org/) - Datastructures and Dataframes (v0.23.4)
* [numpy](http://www.numpy.org/) - Scientific computing with Python (v1.15.2)
## Authors
*Jennifer Bödker*
*Tobias Nietsch*