You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to generate a db of all kmers and their counts for a reference genome using meryl count, then for thousands of small (~1-5 kbp) sequences I want to extract all kmers and find their counts in the genome kmer db.
Is there a way to provide a short sequence as an argument to meryl to query its kmers against an existing db?
It seems like it would not be efficient to run meryl count on all of the short seqs and have to clean up the .meryl files between each query.
The text was updated successfully, but these errors were encountered:
Adamtaranto
changed the title
Get kmer counts from existing meryl db for kmers from a small query sequence
Use sequence to query meryl db
Jun 4, 2024
usage: meryl-lookup <report-type> \
-sequence <input1.fasta> [<input2.fasta>] \
-output <output1> [<output2>] \
-mers <input1.meryl> [<input2.meryl>] [...] [-estimate] \
-labels <input1name> [<input2name>] [...]
Compare kmers in input sequences against kmers in input meryl databases.
Input sequences (-sequence) can be FASTA or FASTQ, uncompressed, or
compressed with gzip, xz, or bzip2.
To compute and report only estimated memory usage, add option '-estimate'.
Report types:
Run `meryl-lookup <report-type> -help` for details on each method.
-bed:
Generate a BED format file showing the location of kmers in
any input database on each sequence in 'input1.fasta'.
Each kmer is reported in a separate bed record.
-bed-runs:
Generate a BED format file showing the location of kmers in
any input database on each sequence in 'input1.fasta'.
Overlapping kmers are combined into a single bed record.
-wig-count:
Generate a WIGGLE format file showing the multiplicity of the
kmer starting at each position in the sequence, if it exists in
an input kmer database.
-wig-depth:
Generate a WIGGLE format file showing the number of kmers in
any input database that cover each position in the sequence.
-existence:
Generate a tab-delimited line for each input sequence with the
number of kmers in the sequence, in the database and common to both.
-include:
-exclude:
Copy sequences from 'input1.fasta' (and 'input2.fasta') to the
corresponding output file if the sequence has at least one kmer
present (include) or no kmers present (exclude) in 'input1.meryl'.
I want to generate a db of all kmers and their counts for a reference genome using
meryl count
, then for thousands of small (~1-5 kbp) sequences I want to extract all kmers and find their counts in the genome kmer db.Is there a way to provide a short sequence as an argument to meryl to query its kmers against an existing db?
It seems like it would not be efficient to run
meryl count
on all of the short seqs and have to clean up the .meryl files between each query.The text was updated successfully, but these errors were encountered: