-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance #14
Conversation
To avoid the cost of recreating the data.frame each time 3.5x faster than previous commit
2x faster than previous commit
3x faster than the previous commit
I wonder if you can also get a little speed up by switching the ordering here to check the 100 digit branch first? This likely depends on your inputs but I think you're likely to generally have more 100 digits so you can break out of the loop quicker. It may also be worth being "cute" and avoiding the Lines 63 to 72 in dcd2d2e
Changing the above to something like for (d in digits) {
if (d == 100) {
summed <- d + d * (summed -1 ) * (summed != 0)
} else if (d %in% c(1E3, 1E6, 1E9, 1E12)) {
total <- total + d + d * (summed - 1) * (summed != 0)
summed <- 0
} else {
summed <- summed + d
}
} I didn't benchmark the dataset above, just some toy examples locally where it seemed to help. It feels like you could do some more complicated vectorisation as well but would need to ponder that some more and may not be worth the effort. |
👍 running the proposed version took me |
@bahadzie, would you be able to take care of the conflicts in this PR? Let me know if you've tried and gotten stuck, and I can give it a look myself. |
@Bisaloo there a lot of other changes that were implemented as a result of the full package review during CRAN submission that is causing 6 tests currently in main to fail. In my opinion our options are
What's your preference? |
If you're taking care of it, I don't have a preference. Both solutions are valid and will lead to the same result. |
I will take care of it. |
Superseded by #15 |
Should fix epiverse-trace/cleanepi#163 for the most part. In total, this allows a ~20x speedup.
I'll likely propose a C implementation of
number_from()
in a couple of weeks for fun and as an exercise. It won't need to be merged and can live as an alternative branch for users with high performance need & a C compiler.main
versionCreated on 2024-07-27 with reprex v2.1.1
proposed version
Created on 2024-07-27 with reprex v2.1.1