Training domain-aware embeddings for clustering purposes, written in Python and Keras
Authors: Hang Hu, Yang Liu, Yueyang Chen, Robert Rallo
DeepChEmbed is an open-source python package which develops new types of chemical embeddings for the purpose of improving the classification of chemical properties, such as biodegradability, toxicity and etc.
Check more details from our poster for UW DIRECT Program
- Wrapper model class for Kmeans and Autoencoder
- Combined training of Autoencoder and KMeans Clustering/Classifying
- Wrapper function for visualization of high dimensional data using t-SNE projection
- Coupling with advanced autoencoding method, such as convolutional autoencoder
- Coupling with other classification algorithms, such as support vector machines, etc.
- Developing ?interpretable? embeddings: cooperated with the chemical meanings
- Python, version 3.6.7 or later
- Conda, version 4.6.8 or later
- Numpy, version 1.16.3 or later
- Pandas, version 2.2.4 or later
- Keras, version 1.16.3 or later
- Tensorflow, version 1.13.1 or later
- Scikit-learn, version 0.20.3 or later
- Matplotlib, version 0.9.0 or later
- Seaborn, version 1.16.3 or later
- RDKit, version 2019.03.1 or later
- Mordred, version 1.1.1or late
You can execute the following commands
from your computer's terminal application:
-
Either clone the deepchembed repository:
git clone https://github.com/chembed/deepchembed.git
or download the zip file:
curl -O https://github.com/chembed/deepchembed/archive/master.zip
-
cd deepchembed
-
conda env create -n environment.yml
-
conda activate deepchembed
You can find all the tutorial scripts in this directory: https://github.com/chembed/DeepChEmbed/tree/master/deepchembed/tutorials
deepchembed (master)
|--data
|--
|--doc
|--
|--deepchembed
|--notebook_scripts
|---
|--tutorials
|--
|--tests
|--
|--\_\_init\_\_.py
|--cluster.py
|--dce.py
|--descriptor.py
|--dimreducer.py
|--utilities.py
|--.coverage
|--.gitignore
|--.travis.yml
|--LICENSE
|--README.md
|--environment.yml
|--requirements.txt
Any contributions to the project are warmly welcomed! If you discover any bugs, please report them in the issues section of this repository and we'll work to sort them out as soon as possible. If you have data that you think will be good to train our model on, please contact one of the authors.
deepchembed is licensed under the MIT license.