Analysis Workflow Course Project - Programming for Bioinformatics (Bioinformatics Professional Diploma)
- Ahmed Omar Lamloum
- Mohamed Magdy AboelEla (Team Leader)
- Usama Bakry
- Waleed Faheem Amer
The overall purpose of PyOmiX is to create an analysis workflow that generate a simple phylogeny trees from multiple sequence alignment files for a list of SWISS-Prot ids using Clustal Omega, throughout a series of steps as described in the following flowchart.
python pyomix.py -i <swiss-prot ids file dir> -d <database fasta file> -o <output dir>
- Directory with a subdirectory for each ID from the input list.
- In each directory:
- Sequence fasta file from UniProt.
- Alignment file from Diamond.
- Sequences fasta file for accessions numbers from NCBI.
- Mulitple sequence alignment file from Clustal Omega.
- Phylogenetic tree from Clustal Omega.
- Function to make directories for swiss-prot ids. (done)
- Input: ids file.
- Output: list of ids directories.
- Function to get fasta file (sequence) using request from UniProt.
- Input: swiss-prot id.
- Output: sequence fasta file.
- Function to align sequence using Diamond. (done)
- Input: sequence fasta file and database file.
- Output: alignment file.
- Function to get fasta file (sequence) using request from NCBI.
- Input: accession number.
- Output: sequence fasta file.
- Function to merge multiple fasta files in one fasta file.
- Input: list of fasta files.
- Output: fasta file.
- Function to perform multiple sequence alignment and get a phylogenetic tree.
- Input: fasta file.
- Output: alignment file and phylogenetic tree file.
- Implement python script to run it on the command line.
- Diamond Aligner.
- Clustal Omega (clustalo.py)
- Unsuitability of the extracted phylogenetic tree from Clustal Omega, so, we will use Simple Phylogeny Tree module instead.
- Implementation the python script to run it on command line.
- Function to make directories for swiss-prot ids. (done)
- Function to get fasta file (sequence) using request from UniProt.
- Function to align sequence using Diamond. (done)
- Function to get fasta file (sequence) using request from NCBI.