-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advice on setting for short query in long reference search #30
Comments
So like I mentioned in the other issue, there are two approaches you can take:
|
In your example, I think you swapped This is some related code I wrote, it might be helpful: https://github.com/Daniel-Liu-c0deb0t/ANTISEQUENCE/blob/main/src/iter/match_any_reads.rs#L544 |
Ok I made a diagram (see https://docs.rs/block-aligner/0.5.1/block_aligner/scan_block/struct.Block.html) that should make the different alignment types more clear. |
Hello Both, Thanks, Jianshu |
Thanks! Actually Robert Edgar recently got in touch with me about Block Aligner stuff, probably for his tools. |
Hey @Daniel-Liu-c0deb0t!
I have a bunch of sequences (most 20nts, few 27nts) which I want to find in a lot (millions) of longer sequences.
Initially I use a k-mer index, and find matching pairs, which I could probably extend using the block-aligner (perhaps using block aligner directly could work? not sure). I'm a bit stuck after reading #28. Using the k-mers as seeds I could find shared prefixes between the query and references, although a kmer might be slightly of of-course so setting something like
FREE_QUERY_START_GAPS
might help, and to terminate earlier theFREE_QUERY_END_GAPS
. However, I'm not sure when reading the docs what to set exactly to achieve this as there seem quite some exceptions to keep in mind "Note that this has a limitation: the min block size must be greater than the length of the query.". Could you provide an example that does something like this?For example, lets take:
A good alignment would be skipping the first
C
in the reference, and then aligningATGGGC
and using all theA
's as a gap.Perhaps something like this:
Giving:
But can I now know the positions in the reference? Ideally without computing the cigar as you said it's expensive.
Thanks in advance!
The text was updated successfully, but these errors were encountered: