Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of numberize on larger datasets. #15

Merged
merged 9 commits into from
Oct 7, 2024
Merged

Conversation

bahadzie
Copy link
Member

  • I have read the CONTRIBUTING guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Checks have been run locally and pass

This PR implements the performance improvements suggested by @Bisaloo in #14.

The current code in main, tested on a dataset in #14 with 0.5E6+ records it takes over 50 seconds to complete.

This PR tested on the same dataset in #14 now takes under 3 seconds. 17X faster.

No breaking changes have been introduced.

R/numberize.R Outdated Show resolved Hide resolved
@bahadzie bahadzie merged commit 58e8534 into main Oct 7, 2024
7 checks passed
@bahadzie bahadzie deleted the perf branch October 7, 2024 10:01
This was referenced Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants