Predicting Bio-Activity

In order to make a structure based predict on the bio-activity of molecules, a list of features is generated with a KNIME workflow. This list is used as input for a Support Vector Machine (SVM) Predictor. In the script, the compounds contained in the input data file are used to train the predictor. Furthermore, the parameters of the predictor are adjusted by GridSearchCV: The predictor is trained multiple times with different combinations of available parameters and the best predictor is then used to predict the bio-activity.

Feature Calculation

The KNIME workflow featureGeneration.knar receives an input file containing SMILES and the predicted bio-activity of the molecules in a comma separated csv file. It generates a list of features for the molecules and outputs a comma separated file containing the activity, the SMILES structure the molecules corresponding features.

Classification

In order to run the program one has to specify

-train Path of the input csv file generated by the KNIME workflow, containing the training molecules -test Path of the input csv file generated by the KNIME workflow, containing the molecules to be tested -out Destination path of the resulting prediction csv

SVM Classifier

SVM_GridSearch.py -train trainingData_Features.csv -test testData_Features.csv -out SVM_GridSearch_res.csv

Built With

KNIME - Analytics Platform (3.7)
RDKIT - Software Package to read and analyse SMILE data (3.4.0v)
Python - Python programming language (3.6)
scikit-learn - Software Package for Machine Learning (v0.20.1)
matplotlib - 2D Plotting Library (2.2.2)
pandas - Datastructures and Dataframes (v0.23.4)

Authors

Jennifer Bödker Tobias Nietsch

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
FeatureGeneration.knwf		FeatureGeneration.knwf
Project_Boedker_Nietsch.knar		Project_Boedker_Nietsch.knar
README.md		README.md
README.txt		README.txt
README_RF_NN.md		README_RF_NN.md
README_RF_NN.txt		README_RF_NN.txt
SOTA.knwf		SOTA.knwf
SVM_GridSearch.py		SVM_GridSearch.py
SVM_GridSearch_out.csv		SVM_GridSearch_out.csv
SVM_GridSearch_out_withTrain.csv		SVM_GridSearch_out_withTrain.csv
SVM_GridSearch_withTrain.py		SVM_GridSearch_withTrain.py
classification.knwf		classification.knwf
clustering.knwf		clustering.knwf
evaluation.py		evaluation.py
neuronalNetwork_GridSearch.py		neuronalNetwork_GridSearch.py
randomForest_GridSearch.py		randomForest_GridSearch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Bio-Activity

Feature Calculation

Classification

SVM Classifier

Built With

Authors

About

Releases

Packages

Contributors 2

Languages

jenniferboedker/CHIN

Folders and files

Latest commit

History

Repository files navigation

Predicting Bio-Activity

Feature Calculation

Classification

SVM Classifier

Built With

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages