JavaScript english spellchecker written in TypeScript.
Demo: http://spelt-demo.surge.sh/
npm i --save spelt
- British English dictionary
npm i --save spelt-gb-dict
- American English dictionary
npm i --save spelt-us-dict
- Canadian English dictionary
npm i --save spelt-ca-dict
- Australian English dictionary
npm i --save spelt-au-dict
// import the lib
import spelt from "spelt";
// import one of the dictionaries
import {dictionary} from "spelt-gb-dict";
// build dictionary
const check = spelt({
dictionary:dictionary,
// can be either "gb" or "us"
distanceThreshold:0.2
// when a correction found with this distance
// we'll stop looking for another
// this would improve performance
});
console.log(check("heve"));
The above code would output:
{
// the raw input
raw:"heve",
// correct or not
correct:false
// corrections array sorted by string distance
[
{
// possible correction
correction:"have",
// distance from the input word
distance:0.4
},
// .... other possible corrections
]
}
I've noticed that a lot of spellcheckers are using the levenshtein distance (LD), I don't think it's the appropriate solution, since it doesn't take moving a two letters around in consideration.
For example:
- the distance between
abcde
andabcxx
is2
. - the distance between
abcde
andabced
is also2
.
But on the first case we introduced two new letters, and removed two letters! while on the second case we just moved the e
and d
around without introducing or removing any letter.
So in short, I don't see the levenshtein distance as an appropriate solution for a spellchecker.
I've wrote my own string distance calculator and you can find it here.
- Spellchecking a book: Processing H.G Wells Novel The Time Machine with (1000s of misspellings introduced took about 8 seconds), in a rate of 4K words/second.
- Spellchecking Wikipedia list: Processing about 4 thousands words, all misspelt, took about 3.5 seconds with a rate of 2.3K word/second.
This is not very impressive, but I'm working on it. However, it's far better than Norvig's spellchecker.
Running on wikipedia's list, with a distance threshold of 0
, It was able to find the accurate correction in the first 5 suggestions on 85% of the cases.
The MIT License