Skip to content

Commit

Permalink
README: minor update
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Jul 1, 2024
1 parent 59923b1 commit f4fc5c7
Showing 1 changed file with 6 additions and 2 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,11 @@ bad 3

You can configure a minimum frequency threshold using ``--freq``.

Rather than match all of the lexicon against the text, you can also iterate over tokens in the text and check if they occur in the lexicon. This uses a hash-map instead of a suffix array and is typically faster. It is more limited, however, and can not be used with frequency thresholding, and counting. It will always produce output like ``--verbose``:
Rather than match all of the lexicon against the text, you can also iterate
over tokens in the text and check if they occur in the lexicon. This uses a
hash-map instead of a suffix array and is typically faster. It is more limited,
however, and can not be used with frequency thresholding or counting. It will
always produce verbose output (similar to ``--verbose``):

```
$ lexmatch --tokens --query good --query bad /nettmp/republic.short.txt
Expand All @@ -110,7 +114,7 @@ good 3480 3484
bad 3488 3491
```

Unlike before, you will find the items are now returned in reading order.
Unlike before, you will find the matches are now returned in reading order.

When using ``--tokens`` we rely on whitespace and punctuation to delimit
tokens. This does not work for languages such as Chinese, Japanese and Korean
Expand Down

0 comments on commit f4fc5c7

Please sign in to comment.