-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
35 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Lexmatch | ||
|
||
This is a simple lexicon matching tool that, given a lexicon of words or phrases, identifies all matches in a given target text. | ||
It can be used compute a frequency list for a lexicon, on a target corpus. | ||
|
||
The implementation uses suffix arrays. The text must be plain-text UTF-8 and is limited to 2^32 bytes (about 4GB). | ||
|
||
|
||
## Installation | ||
|
||
You can build and install the latest stable release using Rust's package manager: | ||
|
||
``` | ||
cargo install lexmatch | ||
``` | ||
|
||
or if you want the development version after cloning this repository: | ||
|
||
``` | ||
cargo install --path . | ||
``` | ||
|
||
No cargo/rust on your system yet? Do ``sudo apt install cargo`` on Debian/ubuntu based systems, ``brew install rust`` on mac, or use [rustup](https://rustup.rs/). | ||
|
||
## Usage | ||
|
||
See ``lexmatch --help``. | ||
|
||
Simple example: | ||
|
||
``` | ||
$ lexmatch --lexicon lexicon.lst --text corpus.txt | ||
``` | ||
|
||
The lexicon must be plain-text UTF-8 containing one entry per line, an entry need not be a single word and is not constrained in length. |