Skip to content

Commit

Permalink
documentation and fixes for coverage
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Jul 1, 2024
1 parent b8d3611 commit 90cc69f
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,14 @@ bad 3488 3491

Unlike before, you will find the matches are now returned in reading order.

If you add `--coverage` then you will get an extra last line with some coverage
statistics. This is useful to see how much of the text is covered by your
lexicon.

```
#coverage (tokens) = 7/627 = 0.011164274322169059
```

When using ``--tokens`` we rely on whitespace and punctuation to delimit
tokens. This does not work for languages such as Chinese, Japanese and Korean
that are not delimited in such a way. For such languages, similar linear search
Expand Down
17 changes: 15 additions & 2 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,11 @@ fn main() {
exit(1);
}

if (!args.is_present("tokens") && !args.is_present("cjk")) && args.is_present("coverage") {
eprintln!("ERROR: --coverage can only be used with --tokens or --cjk");
exit(1);
}

let mut lexicons: Vec<Lexicon> = if args.is_present("lexicon") {
args.get_many("lexicon")
.unwrap()
Expand Down Expand Up @@ -237,7 +242,11 @@ fn main() {
"#coverage (tokens) = {}/{} = {}",
matchcount,
totalcount,
matchcount as f64 / totalcount as f64
if totalcount == 0 {
0.0
} else {
matchcount as f64 / totalcount as f64
}
);
}
} else if args.is_present("cjk") {
Expand Down Expand Up @@ -281,7 +290,11 @@ fn main() {
"#coverage (chars) = {}/{} = {}",
matchcount,
totalcount,
matchcount as f64 / totalcount as f64
if totalcount == 0 {
0.0
} else {
matchcount as f64 / totalcount as f64
}
);
}
}
Expand Down

0 comments on commit 90cc69f

Please sign in to comment.