The spice package can be used for calculating sequence-based protein features, visualizing the obtained features, and training and testing of protein classifiers using these features.
This featext.py module can be used for sequence-based protein feature extraction. it uses the featmat.py module to manage the labeled feature matrix, an m x n matrix for m proteins and n features, and the dataset.py module to manage the set of proteins and their corresponding labels.
The classification.py module is a layer on top of scikit-learn that can be used to construct protein classifiers and the classify.py module can be used to test new protein sequences on an allready trained classifier.
The project_management.py module is used by the SPiCE website to manage user projects.
The following software is required to run spice:
- numpy >= 1.7.1
- scipy >= 0.12.0
- matplotlib >= 1.2.2
- scikit-learn >= 0.14.1
The biopy package that can also be found on my github repository is also required:
- biopy >= 0.1.0"
On linux systems, the sofware can be installed using:
sudo python setup.py install
The software can be using the import statement, for example:
from spice import featext
Four command-line tools are also provided:
- featext
- classification
- classify