Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neofuzz indexing fails for list of 400K strings #11

Open
SeanPedersen opened this issue Oct 26, 2024 · 1 comment
Open

neofuzz indexing fails for list of 400K strings #11

SeanPedersen opened this issue Oct 26, 2024 · 1 comment

Comments

@SeanPedersen
Copy link

Error message:

python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
zsh: killed     python neofuzztest.py

Code:

import random
import string
from neofuzz import char_ngram_process


def rand_str(length):
    characters = string.ascii_letters + string.digits
    return "".join(random.choice(characters) for _ in range(length))


names = [
    rand_str(8) + " " + rand_str(6) + " " + rand_str(4) + " " + str(i)
    for i in range(400_000)
]
print(len(names))

neofuzz_process = char_ngram_process()
neofuzz_process.index(names)

query = "test 3333"

pre_filter = neofuzz_process.extract(query, limit=2000, refine_levenshtein=True)
print(pre_filter[:10])

The blazing fast speed of this lib can only shine if working on large datasets.

@SeanPedersen SeanPedersen changed the title neofuzz indexing fails for list of 500K strings neofuzz indexing fails for list of 400K strings Oct 26, 2024
@x-tabdeveloping
Copy link
Owner

hmm interesting... Thanks for taking your time to look into this. Can I get a full error log? I have a feeling this might have something to do with PyNNDescent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants