Skip to content

Commit

Permalink
Merge pull request #18 from BioInf-Wuerzburg/patch-bump-version-to-2.…
Browse files Browse the repository at this point in the history
…1.10

Update README.org to documentation for version 2.1.10
  • Loading branch information
greatfireball authored Feb 12, 2019
2 parents 12b11fb + ef5b65b commit 578cdef
Showing 1 changed file with 17 additions and 13 deletions.
30 changes: 17 additions & 13 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ Options:
Print stats to file. Default STDOUT or STDERR if STDOUT is in use.

--ids <FILE/-/IDLIST>
--ids-exclude
File of sequence IDs or literal list of IDs to be reported. Reads
comma-, whitespace and newline separated lists. Leading '>' or '@'
are ignored.
Expand All @@ -22,28 +21,30 @@ Options:
SeqFilter fasta.fa --ids ids.list # file

--ids-pattern <FILE/-/PATTERNLIST>
--ids-split
A Perl PATTERN or a link to a file containing multiple PATTERN, one
per line, to match against sequence ids. Matching sequences will be
returned.
Match a perl PATTERN or a link to a file containing multiple
PATTERNs, one per line, against sequence ids. Keep matching
sequences.

SeqFilter seqs.fq --ids-patt 'comp2_c13_seq.*|comp2_c88_seq.*'

Extend use of --ids-pattern from only identifying sequences to also
splitting them into different files according to a REQUIRED first
capture group in the match. If capture group is empty, sequences are
written to --out. If multiple pattern are provided and an ID has
multiple matches, all but the first match are ignored.
Extended usage: add capture groups to perl P(A)TT(ER)N using "()".
Splits matched sequences to different output files based on capture.
Use sprintf conversions (%s, %02d, ...) in --out to define a
template for the output file names.

# split seqs by library (LIB1, LIB2, LIB14)
SeqFilter multilib.fq --ids-pattern '\w+(\d+)' --ids-split
# creates multilib1.fq, multilib2.fq, multilib14.fq
SeqFilter multilib.fq --ids-pattern '\w+(\d+)' --out multilib_%02d.fq
# creates multilib_01.fq, multilib_02.fq, multilib_14.fq

NOTE: Perl needs to open a filehandle to every split file, this can
slow things down considerably if you want to split into more than
1000 different files with occurances of patterns randomly mixed in
source file.

--ids-exclude
Reverse behaviour of --ids and --ids-pattern to excluding matched
sequences, and keeping unmatched ones.

--ids-rename <PATTERN>
Provide a perl substitution pattern as string. The pattern is
applied to every id. Use global "$COUNT" to access the output
Expand Down Expand Up @@ -181,9 +182,12 @@ Options:

-H|--histogram
Plot distribution of bases by length as ASCII plot. Uses linear
scale for data sets with difference in order of magnitude <= 3, log
scale for data sets with difference in order of magnitude < 2, log
scale otherwise.

--[no]-smart-labels
Toggle shortening filepaths to shortest unique labels.

-p|--progress
Display progress bars (eq. '--verbose 2')

Expand Down

0 comments on commit 578cdef

Please sign in to comment.