Use sequence to query meryl db #47

Adamtaranto · 2024-06-04T07:14:56Z

I want to generate a db of all kmers and their counts for a reference genome using meryl count, then for thousands of small (~1-5 kbp) sequences I want to extract all kmers and find their counts in the genome kmer db.

Is there a way to provide a short sequence as an argument to meryl to query its kmers against an existing db?

It seems like it would not be efficient to run meryl count on all of the short seqs and have to clean up the .meryl files between each query.

The text was updated successfully, but these errors were encountered:

brianwalenz · 2024-06-04T14:01:04Z

That sounds like a job for meryl-lookup:

usage: meryl-lookup <report-type> \
         -sequence <input1.fasta> [<input2.fasta>] \
         -output   <output1>      [<output2>] \
         -mers     <input1.meryl> [<input2.meryl>] [...] [-estimate] \
         -labels   <input1name>   [<input2name>]   [...]

  Compare kmers in input sequences against kmers in input meryl databases.

  Input sequences (-sequence) can be FASTA or FASTQ, uncompressed, or
  compressed with gzip, xz, or bzip2.

  To compute and report only estimated memory usage, add option '-estimate'.

  Report types:
    Run `meryl-lookup <report-type> -help` for details on each method.


  -bed:
     Generate a BED format file showing the location of kmers in
     any input database on each sequence in 'input1.fasta'.
     Each kmer is reported in a separate bed record.

  -bed-runs:
     Generate a BED format file showing the location of kmers in
     any input database on each sequence in 'input1.fasta'.
     Overlapping kmers are combined into a single bed record.

  -wig-count:
     Generate a WIGGLE format file showing the multiplicity of the
     kmer starting at each position in the sequence, if it exists in
     an input kmer database.

  -wig-depth:
     Generate a WIGGLE format file showing the number of kmers in
     any input database that cover each position in the sequence.

  -existence:
     Generate a tab-delimited line for each input sequence with the
     number of kmers in the sequence, in the database and common to both.

  -include:
  -exclude:
     Copy sequences from 'input1.fasta' (and 'input2.fasta') to the
     corresponding output file if the sequence has at least one kmer
     present (include) or no kmers present (exclude) in 'input1.meryl'.

Adamtaranto changed the title ~~Get kmer counts from existing meryl db for kmers from a small query sequence~~ Use sequence to query meryl db Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sequence to query meryl db #47

Use sequence to query meryl db #47

Adamtaranto commented Jun 4, 2024

brianwalenz commented Jun 4, 2024

Use sequence to query meryl db #47

Use sequence to query meryl db #47

Comments

Adamtaranto commented Jun 4, 2024

brianwalenz commented Jun 4, 2024