Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce len() calls #412

Merged
merged 7 commits into from
Jan 16, 2025

Conversation

bvandercar-vt
Copy link
Contributor

Reduce duplicate len() calls by assigning length to variable.

Probably only has a microscopic effect on performance, but it is something.

Copy link
Member

@maxbachmann maxbachmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really convinced this will make a measurable difference in performance. However I would be fine with the changes in the core library as long as the bug in the Cython implementation is fixed.

print("Words :", len(words))
print("Sample:", len(sample))
print("Words :", len_words)
print("Sample:", len_sample)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in the bench directory aren't relevant, since they are outside the benchmarks. So readability is more important here. Really this part is super slow because of:

    words = ["".join(random.choice(string.ascii_letters + string.digits) for _ in range(10)) for _ in range(10000)]

anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So readability is more important here.

That makes sense. Reverted changes to the bench folder.

src/rapidfuzz/fuzz_py.py Show resolved Hide resolved
@@ -1208,6 +1208,7 @@ def extract(query, choices, *, scorer=WRatio, processor=None, limit=5, score_cut
cdef RF_Scorer* scorer_context = NULL
cdef RF_ScorerFlags scorer_flags
cdef int64_t c_limit
cdef int64_t choices_len = <int64_t>len(choices)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash when choices is a generator. So this would need to be handled inside the try/except

Copy link
Contributor Author

@bvandercar-vt bvandercar-vt Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, nice catch, changed: 74b248d

src/rapidfuzz/process_py.py Show resolved Hide resolved
Comment on lines 1227 to 1230
if limit is None or limit > choices_len:
limit = choices_len

c_limit = limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary to convert choices_len back to a Python Object here. This could probably be something like:

Suggested change
if limit is None or limit > choices_len:
limit = choices_len
c_limit = limit
c_limit = choices_len
if limit is not None:
c_limit = min(c_limit, <int64_t>limit)

Copy link
Contributor Author

@bvandercar-vt bvandercar-vt Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I'm not really sure what's happening here. Sounds good, made that change.

@maxbachmann maxbachmann merged commit 630d4c2 into rapidfuzz:main Jan 16, 2025
@maxbachmann
Copy link
Member

Thanks for the contribution

@bvandercar-vt bvandercar-vt deleted the bvandercar/reduce-len-calls branch January 16, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants