Calculates the Sørensen–Dice coefficient between two strings:
https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient
Turned into a Python module with PyO3 crate (original Rust program here: https://github.com/nebelgrau77/sorensen_dice_coeff).
More than twice as fast than a pure Python function (timed with %timeit in JupyterLab).
How to use:
Copy file:
- sdcoeff.so (Linux) or
- sdcoeff.pyd (Windows)
into the folder where you're running Python.
from sdcoeff import coefficient
# syntax: coefficient(wordA, wordB)
c = coefficient('Philadelphia', 'Philladelphia')
Note: if any of the words is shorter than two characters, i.e. doesn't have a single bigram, function will return 0.
TO DO:
- Currently only compiled for Linux and Windows 10 - compile for MacOS if possible.