Fast fuzzy-search - the native replacement for fuzzaldrin-plus
- Fuzzaldrin plus is an awesome library that provides fuzzy-search that is more targeted towards filenames.
- Fuzzaldrin-plus-fast is a rewrite of the library in native C++ to make it fast. The goal is to make it a few hundred millisecond filter times for a dataset with 1M entries. This performance is helpful in Atom's fuzzy finder to open files from large projects such as Chrome/Mozilla.
Fuzzaldrin-plus-fast:
- provides
filterTree
function which allows to fuzzy filter text in nested tree-like objects. - allows setting the candidates only once using
ArrayFilterer
andTreeFilterer
classes, and then, performfilter
multiple times. This is much more efficient than calling thefilter
orfilterTree
functions directly every time.
Fuzzaldrin-plus-fast achieves 10x-20x performance improvement over Fuzzaldrin plus for chromium project with 300K files. This high performance is achieved using the following techniques.
- Converting Javascript/CoffeeScript code to native C++ bindings provides 4x performance benefit.
- Use multiple threads to parallelize computation to achieve another 4x performance benefit. Currently, up to 8 threads are used if there are more than 10K candidates to filter.
- Some miscellaneous improvements provide additional benefit.
This project potentially solves the following Atom fuzzy-finder issues if used. atom/fuzzy-finder#271 and atom/fuzzy-finder#88
API is backward compatible with Fuzzaldrin and Fuzzaldrin-plus. Additional functions are provided to achieve better performance that could suit your needs
Installation:
npm install fuzzaldrin-plus-fast
To import all the functions:
import * as fuzzaldrin from "fuzzaldrin-plus-fast"
or
const fuzzaldrin = require("fuzzaldrin-plus-fast")
Sort and filter the given candidates by matching them against the given query.
candidates
- An array of strings or objects.query
- A string query to match each candidate against.options
options. You should provide akey
in the options if an array of objects are passed.
Returns an array of candidates sorted by best match against the query.
const { filter } = require('fuzzaldrin-plus-fast')
// With an array of strings
let candidates = ['Call', 'Me', 'Maybe']
let results = filter(candidates, 'me') // ['Me', 'Maybe']
// With an array of objects
candidates = [
{name: 'Call', id: 1}
{name: 'Me', id: 2}
{name: 'Maybe', id: 3}
]
results = filter(candidates, 'me', {key: 'name'}) // [{name: 'Me', id: 2}, {name: 'Maybe', id: 3}]
Performance Note: use ArrayFilterer
class if you call the filter
function multiple times on a certain set of candidates. filter
internally uses this class, however, in each call it sets the candidates from scratch which can slow down the process.
ArrayFilterer is a class that allows to set the candidates
only once and perform filtering on them multiple times. This is much more efficient than calling the filter
function directly.
export class ArrayFilterer<T> {
constructor()
/** The method to set the candidates that are going to be filtered
* @param candidates An array of tree objects.
* @param dataKey (optional) if `candidates` is an array of objects, pass the key in the object which holds the data. dataKey can be the options object passed to `filter` method (but this is deprecated).
*/
setCandidates<T>(candidates: Array<T>, dataKey?: string): void
/** The method to perform the filtering on the already set candidates
* @param query A string query to match each candidate against.
* @param options options
* @return returns an array of candidates sorted by best match against the query.
*/
filter(query: string, options: IFilterOptions<T>): Array<T>
}
Example:
const { ArrayFilterer } = require('fuzzaldrin-plus-fast')
const arrayFilterer = new ArrayFilterer()
arrayFilterer.setCandidates(['Call', 'Me', 'Maybe']) // set candidates only once
// call filter multiple times
arrayFilterer.filter('me')
arrayFilterer.filter('all')
Sort and filter the given Tree candidates by matching them against the given query.
A tree object is an object in which each entry stores the data in its dataKey and it has (may have) some children (with a similar structure) in its childrenKey. See the following example.
candidates
An array of tree objects.query
A string query to match each candidate against.dataKey
the key of the object (and its children) which holds the datachildrenKey
the key of the object (and its children) which hold the childrenoptions
optionsreturns
An array of candidate objects in form of{data, index, level}
sorted by best match against the query. Each objects has the address of the object in the tree usingindex
andlevel
.
const { filterTree } = require("fuzzaldrin-plus-fast")
candidates = [
{ data: "bye1", children: [{ data: "hello" }] },
{ data: "Bye2", children: [{ data: "_bye4" }, { data: "hel" }] },
{ data: "eye" },
]
filterTree(candidates, "he", "data", "children") // [ { data: 'hel', index: 1, level: 1 }, { data: 'hello', index: 0, level: 1 }]
// With an array of objects (similar to providing `key` to `filter` function)
const candidates = [{ data: "helloworld" }, { data: "bye" }, { data: "hello" }]
results = filter(candidates, "hello", { key: "name" }) // [ { data: 'hello', index: 2, level: 0 }, { data: 'helloworld', index: 0, level: 0 } ]
Performance Note: use TreeFilterer
class if you call the filterTree
function multiple times on a certain set of candidates. filterTree
internally uses this class, however, in each call it sets the candidates from scratch which can slow down the process.
TreeFilterer
is a class that allows to set the candidates
only once and perform filtering on them multiple times. This is much more efficient than calling the filterTree
function directly.
export class TreeFilterer<T> {
constructor()
/** The method to set the candidates that are going to be filtered
* @param candidates An array of tree objects.
* @param dataKey the key of the object (and its children) which holds the data (defaults to `"data"`)
* @param childrenKey the key of the object (and its children) which hold the children (defaults to `"children"`)
*/
setCandidates<T>(candidates: Array<T>, dataKey?: string, childrenKey?: string): void
/** The method to perform the filtering on the already set candidates
* @param query A string query to match each candidate against.
* @param options options
* @return An array of candidate objects in form of `{data, index, level}` sorted by best match against the query. Each objects has the address of the object in the tree using `index` and `level`.
*/
filter(query: string, options: IFilterOptions<object>): TreeFilterResult[]
}
Example:
const { TreeFilterer } = require('fuzzaldrin-plus-fast')
const arrayFilterer = new TreeFilterer()
const candidates = [
{data: "bye1", children: [{data: "hello"}]},
{data: "Bye2", children: [{data: "_bye4"}, {data: "hel"}]},
{data: "eye"},
]
arrayFilterer.setCandidates(candidates, "data", "children") // set candidates only once
// call filter multiple times
arrayFilterer.filter('hello')
arrayFilterer.filter('bye')
Score the given string against the given query.
string
- The string the score.query
- The query to score the string against.
const { score } = require('fuzzaldrin-plus-fast')
score('Me', 'me') # 0.17099999999999999
score('Maybe', 'me') # 0.0693
Gives an array of indices at which the query matches the given string
const { match } = require("fuzzaldrin-plus-fast")
match("Hello World", "he") // [0, 1]
match("Hello World", "wor") // [6, 7, 8]
match("Hello World", "elwor") // [1, 2, 6, 7, 8]
Gives an HTML/Markdown string that highlights the range for which the match happens
wrap("helloworld", "he")
helloworld
wrap("Hello world", "he")
Hello world
In all the above functions, you can pass an optional object with the following keys
{
/** only for `filter` function */
/** The key to use when candidates is an object */
key?: T extends string ? never : keyof T
/** only for `filter` function */
maxResults?: number
/** @default false */
allowErrors?: boolean
/** @default true */
usePathScoring?: boolean
/** @default false */
useExtensionBonus?: boolean
pathSeparator?: '/' | '\\' | string
}
- Bump up version in package.json.
- Create a new release tag in Github, for the bumped version. This should trigger builds in GitHub Actions. The binaries will be uploaded to the action's page.
- Manually download the prebuilt binaries from GitHub and publish.
npm publish