Skip to content

Commit

Permalink
added a README
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Sep 20, 2021
1 parent 4cefd34 commit a9e3af5
Showing 1 changed file with 35 additions and 0 deletions.
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Lexmatch

This is a simple lexicon matching tool that, given a lexicon of words or phrases, identifies all matches in a given target text.
It can be used compute a frequency list for a lexicon, on a target corpus.

The implementation uses suffix arrays. The text must be plain-text UTF-8 and is limited to 2^32 bytes (about 4GB).


## Installation

You can build and install the latest stable release using Rust's package manager:

```
cargo install lexmatch
```

or if you want the development version after cloning this repository:

```
cargo install --path .
```

No cargo/rust on your system yet? Do ``sudo apt install cargo`` on Debian/ubuntu based systems, ``brew install rust`` on mac, or use [rustup](https://rustup.rs/).

## Usage

See ``lexmatch --help``.

Simple example:

```
$ lexmatch --lexicon lexicon.lst --text corpus.txt
```

The lexicon must be plain-text UTF-8 containing one entry per line, an entry need not be a single word and is not constrained in length.

0 comments on commit a9e3af5

Please sign in to comment.