Skip to content

Releases: Merck/deepbgc

Fix tensorflow dependency version

01 Oct 07:32
Compare
Choose a tag to compare
  • Require Tensorflow < 2.0.0

Remove debug output of pfam descriptions

01 Oct 06:59
Compare
Choose a tag to compare
  • Removed forgotten command that printed out all pfam descriptions to STDOUT

Fix pfam description annotations, improve help docs of commands

01 Oct 06:52
Compare
Choose a tag to compare
  • Added DEEPBGC_DOWNLOADS_DIR info to download command
  • Added default values to help annotations
  • Fixed pfam description annotation, now pfams are annotated with text "description" qualifier

Enable running custom model using path to pickle file

06 Sep 09:51
Compare
Choose a tag to compare

Make PFAM_domain annotations compatible with antiSMASH 5

21 Mar 17:37
Compare
Choose a tag to compare

DeepBGC PFAM_domain annotations now use db_xref="PF00067.1" and database="31.0" qualifiers to be compatible with antiSMASH.

Updated GenBank file of ClusterFinder annotated contigs used for validation is provided below.
For all remaining datasets, refer to release 0.1.0: https://github.com/Merck/deepbgc/releases/tag/v0.1.0

Fix trained model download hashes

12 Mar 13:42
Compare
Choose a tag to compare
v0.1.2

Bump up version to 0.1.2.

Using GenBank representation, training models

12 Mar 13:08
Compare
Choose a tag to compare

Changelog

  • DeepBGC now accepts and outputs GenBank files
  • You can now train your own BGC detection model using deepbgc train
  • Data dependencies and models are now automatically downloaded using deepbgc download
  • Compatibility with Python 2.7, Python 3.4+

Training and validation data

  • ClusterFinder_Annotated_Contigs_OLD_PFAM_ANNOTATION.full.gbk - 13 contigs annotated with BGC regions ("cluster" feature) used for validation (from Cimermancic et al.). Note that a newer version with PFAM_domain annotations compatible with DeepBGC 0.1.5 and antiSMASH 5 is provided in release https://github.com/Merck/deepbgc/releases/tag/v0.1.5
  • GeneSwap_Negatives.pfam.tsv - Generated artificial negatives used to train the DeepBGC model
  • MIBiG.activity.csv - Chemical product activity for all MIBiG 1.4 BGCs
  • MIBiG.classes.csv - Chemical product class for all MIBiG 1.4 BGCs
  • MIBiG.pfam.tsv - Sequence of Pfam domains of all MIBiG 1.4 BGCs used to train the DeepBGC model
  • pfam2vec.csv - Pfam2vec embedding (100-dimensional vectors) for all Pfam domain IDs
  • templates - Directory with JSON model templates for training
  • pfam2vec-pfam31-corpus-p0.001.txt.bz2 - NEW Pfam ID corpus used to train pfam2vec (p-value 0.001, original pfam2vec was trained with a less strict p-value of 0.01). Compressed using bzip2.

Models

Models are downloaded automatically using deepbgc download

  • deepbgc.pkl - DeepBGC detection model trained on MIBiG 1.4 dataset
  • clusterfinder_original.pkl - ClusterFinder detection model with original parameters
  • clusterfinder_retrained.pkl - ClusterFinder detection model, trained on MIBiG 1.4 dataset
  • clusterfinder_geneborder.pkl - ClusterFinder model switching only on gene borders, trained on MIBiG 1.4 dataset
  • product_class.pkl - Random Forest classifier predicting product class, trained on MIBiG 1.4 dataset
  • product_activity.pkl - Random Forest classifier predicting product activity, trained on MIBiG 1.4 dataset

Example results

  • example - Result of full DeepBGC pipeline on ClusterFinder_Annotated_Contigs.full.gbk
  • DeepBGC_Example_Result.ipynb - Jupyter notebook previewing contents of the example result folder

v0.0.1

01 Feb 13:14
e3fb4db
Compare
Choose a tag to compare

Code release including trained models.