VirFormer: Identifying viruses as an anomaly

Version: 1.0

Authors: Linshu Yang, Pengyu Long, Qingcheng Zhao

Introduction

VirFormer is a transformer based model for virus nucleotide sequence prediction. It proposed a novel method to predict virus nucleotide sequences by identifying viruses as an anomaly. It is a two-stage model, which first uses an encoder to encode the nucleotide sequences into a latent space, and then uses a classifier to predict the virus nucleotide sequences. The encoder is a transformer based model, which can encode the nucleotide sequences into a latent space. The classifier is a cnn-based model, which can predict the virus nucleotide sequences by identifying viruses as an anomaly in the latent space.

Dependencies

VirFormer requires Python 3.9 with the package of pytorch, numpy, torchmetrics, scikit-learn, jupyter and Biopython. We recommend the use Ananconda to install all dependencies.

Installation

After installing dependencies, you can run the main.py in encoder to train the encoder. And you can run main.py, main_new.py, main_linear.py in classifier to train the classifiers.

For testing, you can run auroc.ipynb in classifier to test different models.

Dataset

Before you run our code, you need to download the data from NCBI using the Accession ID in dataset. And you are recommended to divide them into human, prokaryote, simulation_abundance and virus folders. You may need to change some paths in our code to match the path of your dataset.

NCBI Accessions: https://epan.shanghaitech.edu.cn/l/wFKC6o

Training

Our training requires at least 360GB memory and takes 36h to achieve barely convergence on 8 x Nvidia V100.

Misc

Presentation and paper can be found in media. Details of the model can be found in paper.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
classifier		classifier
encoder		encoder
media		media
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VirFormer: Identifying viruses as an anomaly

Introduction

Dependencies

Installation

Dataset

Training

Misc

About

Releases

Packages

Contributors 3

Languages

profetia/VirFormer

Folders and files

Latest commit

History

Repository files navigation

VirFormer: Identifying viruses as an anomaly

Introduction

Dependencies

Installation

Dataset

Training

Misc

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages