peanut
calculates alignment metrics of a given GAF file from GraphAligner evaluating the CIGAR string.
It outputs four metrics:
Optionally, it writes the nonaln
query regions to BED.
#E
are the number of sequence matches (=
orE
symbol) in the GAF file. Nucleotide positions with sequence matches in multiple alignments are only counted once.query_lens
is the length of all queries in the GAF in nucleotides.
uniq_#E
are the number of unique sequence matches in the GAF file.query_lens
is the length of all queries in the GAF in nucleotides.
multi_#E
are the number of multiple sequence matches in the GAF file. Nucleotide positions with more than one multiple sequence matches are only counted once.query_lens
is the length of all queries in the GAF in nucleotides.
nonaln_#E
are the number of non-sequence matches in the GAF file.query_lens
is the length of all queries in the GAF in nucleotides.
git clone https://github.com/pangenome/rs-peanut.git
cd rs-peanut
cargo build --release
peanut
requires as an input a GAF file -g
.
./target/release/peanut -g aln.gaf
The output is written to stdout in a tab-delimited format.
0.992910744238371 0.9926967987671109 0.00021394547126006352 0.007089255761628998
The first number is the qsc
, the second number is the uniq
, and the third number is the multi
, and the fourth number is the nonaln
.
- Add query sequence alignment match mismatch (qsamm).
- Describe
qsc
. - Remove non-helping metrics
qsamm
andqsm
. - Add 3 new metrics: number of
uniq
ue query base alignments, number ofmulti
ple query base alignments, and number ofnonaln
query bases.
So far, it has not been tested if peanut
also works with GAF files not originating from GraphAligner.