The threshold q-gram distance (tqd
) measures the similarity between two sequences using the concept of q-grams, and is able to capture the hapax (uniquely occurring substring) and repeat content in the sequences.
The threshold q-gram distance requires:
- Python 3
- g++
git clone https://github.com/AlessioMilanese/Threshold_q-gram_distance.git
cd Threshold_q-gram_distance
./setup
Note: in the following examples we assume that the python script tqd
is in the system path.
Here is a simple example on how to obtain the tqd
between two fasta files:
tqd file1.fasta file2.fasta -q 6 -threshold 1
If you want to directly input the string sequences:
tqd ATGGATCAGTC CTGGATCAGAC -q 3 -threshold 1 -strings
Usage: tqd <first_file> <second_file> [options]
Options:
-q INT value of q in the q-grams[10]
-threshold INT value of the threshold [1]
-strings set if the inputs are strings instead of files
-pair_status FILE file to save the pair statuses
-verbose INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [2]