Usage

This repository is the official implementation of the Analyzing-hCov-genome-sequence pipeline.

If you use any part of this repository, we shall be very grateful if you cite our paper Analyzing hCov genome sequences: Applying Machine Intelligence and beyond

Usage

Installation

Please install Tensorflow version: 2.2.0. (Other 2.x versions should work, but have not been tested. Use gpu for better performance)
Please install Keras version: 2.3.1.
Other python libraries used for this project can be installed by running the following command

pip install -r requirements.txt

Input Preprocessing and Labeling

Keep the input sequence fasta file (file used in this analysis can be downloaded from here ) and the info file (sample file present in the input directory) in the Input directory
The input_processing.py script handles the input processing and labelling task. It requires 3 mandatory parameters and 1 optional parameters. They are:
- input
- info_file
- label
- old
There are four options for labelling:
- Death
- CFR_Recovery
- CFR_confirmed_cases
- CFR_Infrastructure
Set old parameter to 1 to use the old Training/Testing Accession ID's for input preprocessing and labelling. Don't use this parameter for generating new Training/Testing set. The models are pre-trained with Death Labelling.
Sample command:

python input_processing.py --input <input_fasta_file_name> --info_file <info_file_name> --label <label_option> --old 1(optional)

Country-wise Representative Sequence Identification and Phylogenetic Analysis

Navigate to the <Clustering_and_Phylogenetic_Analysis> directory.
The controller.py script is a one-stop service for all of related analysis. It requires 2 parameters:
- label (Same as the Input Folder Options)
- method

There are 4 options for method:

Method	Description
Euclidean	Simple Euclidean distance-based method among the 3-mers of the genome sequence
Novel_Fast_Vector	18-dimensional Novel Fast Vector Sequence Comparison Analysis
Accumulated_Fast_Vector	18-dimenasional Accumulated Fast Vector Sequence Comparison Analysis
MAW	Minimum Absent Word Analysis

Only MAW requires additional 4 parameters:
- Minimum_MAW_Length
- Maximum_MAW_Length
- Distance_Method
- Fasta File Name (Must be kept in the Input Directory)
Sample Command:

python controller.py --label <label_option> --method <method_option>

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Classification_kmer_spectrals		Classification_kmer_spectrals
Classification_on_K_mer_One_Hot_Representation		Classification_on_K_mer_One_Hot_Representation
Clustering_and_Phylogenetic_Analysis		Clustering_and_Phylogenetic_Analysis
Generating_test_seqs_for_BD_IN_PK		Generating_test_seqs_for_BD_IN_PK
Input		Input
LightGBM		LightGBM
Mutation_prediction_at_interesting_sites		Mutation_prediction_at_interesting_sites
.gitignore		.gitignore
Sample_description.csv		Sample_description.csv
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

This repository is the official implementation of the Analyzing-hCov-genome-sequence pipeline.

Usage

Installation

Input Preprocessing and Labeling

Country-wise Representative Sequence Identification and Phylogenetic Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

pythonLoader/Analyzing-hCov-Genome-Sequence

Folders and files

Latest commit

History

Repository files navigation

This repository is the official implementation of the Analyzing-hCov-genome-sequence pipeline.

Usage

Installation

Input Preprocessing and Labeling

Country-wise Representative Sequence Identification and Phylogenetic Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages