Skip to content

getalignment workflow

shenjean edited this page Jan 22, 2021 · 1 revision

Performing the blastn search

getalignment can be run using different blastn tasks by passing the --blastn-task option. Choices available include standard blastn, blastn-short, megablast and dc-megablast. Use --blastn-task none if using blast results instead of FASTA file as input.

Differences between the blastn tasks (from https://www.ncbi.nlm.nih.gov/books/NBK279668/):

blastn-task Description
blastn Traditional blastn requiring an exact match of 11
blastn-short Optimized for sequences shorter than 50 bases
megablast Traditional megablast used to find very similar (e.g. intraspecies or closely related species) sequences
dc-megablast Discontiguous megablast used to find more distant (e.g. interspecies) sequences

Differences in number of hits and scores between the blastn tasks:

Parsing blast results

The parse_blast function of getalignment extracts positions from blast output alignment(s) with the highest bit score:

  • If there is only one alignment with the best bit score, it will extract positions from the best alignment.
  • If >1 alignment share the best bit score (i.e. multiple best matches), the function extracts the earliest start position and latest end position.