Merge pull request #18 from BioInf-Wuerzburg/patch-bump-version-to-2.…

…1.10 Update README.org to documentation for version 2.1.10
BioInf-Wuerzburg · Feb 12, 2019 · 578cdef · 578cdef
2 parents 12b11fb + ef5b65b
commit 578cdef
Showing 1 changed file with 17 additions and 13 deletions.
diff --git a/README.org b/README.org
@@ -13,7 +13,6 @@ Options:
         Print stats to file. Default STDOUT or STDERR if STDOUT is in use.
 
     --ids <FILE/-/IDLIST>
-    --ids-exclude
         File of sequence IDs or literal list of IDs to be reported. Reads
         comma-, whitespace and newline separated lists. Leading '>' or '@'
         are ignored.
@@ -22,28 +21,30 @@ Options:
           SeqFilter fasta.fa --ids ids.list       # file
 
     --ids-pattern <FILE/-/PATTERNLIST>
-    --ids-split
-        A Perl PATTERN or a link to a file containing multiple PATTERN, one
-        per line, to match against sequence ids. Matching sequences will be
-        returned.
+        Match a perl PATTERN or a link to a file containing multiple
+        PATTERNs, one per line, against sequence ids. Keep matching
+        sequences.
 
           SeqFilter seqs.fq --ids-patt 'comp2_c13_seq.*|comp2_c88_seq.*'
 
-        Extend use of --ids-pattern from only identifying sequences to also
-        splitting them into different files according to a REQUIRED first
-        capture group in the match. If capture group is empty, sequences are
-        written to --out. If multiple pattern are provided and an ID has
-        multiple matches, all but the first match are ignored.
+        Extended usage: add capture groups to perl P(A)TT(ER)N using "()".
+        Splits matched sequences to different output files based on capture.
+        Use sprintf conversions (%s, %02d, ...) in --out to define a
+        template for the output file names.
 
           # split seqs by library (LIB1, LIB2, LIB14)
-          SeqFilter multilib.fq --ids-pattern '\w+(\d+)' --ids-split
-          # creates multilib1.fq, multilib2.fq, multilib14.fq
+          SeqFilter multilib.fq --ids-pattern '\w+(\d+)' --out multilib_%02d.fq
+          # creates multilib_01.fq, multilib_02.fq, multilib_14.fq
 
         NOTE: Perl needs to open a filehandle to every split file, this can
         slow things down considerably if you want to split into more than
         1000 different files with occurances of patterns randomly mixed in
         source file.
 
+    --ids-exclude
+        Reverse behaviour of --ids and --ids-pattern to excluding matched
+        sequences, and keeping unmatched ones.
+
     --ids-rename <PATTERN>
         Provide a perl substitution pattern as string. The pattern is
         applied to every id. Use global "$COUNT" to access the output
@@ -181,9 +182,12 @@ Options:
 
     -H|--histogram
         Plot distribution of bases by length as ASCII plot. Uses linear
-        scale for data sets with difference in order of magnitude <= 3, log
+        scale for data sets with difference in order of magnitude < 2, log
         scale otherwise.
 
+    --[no]-smart-labels
+        Toggle shortening filepaths to shortest unique labels.
+
     -p|--progress
         Display progress bars (eq. '--verbose 2')