-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential 1.5x performance improvement for Random.Shuffle()
#82838
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsDescriptionHi. This is not a performance problem per se but a potential for quite a big improvement. I noticed that a I recently did some performance optimizations in the rust How it worksShuffling an array of length For an array of length 12, instead of generating 12 different random numbers, you generate one random number in the range [0,479001600) and use div and mod methods to work out which swaps to do. For longer arrays you use more groups and they get a bit smaller but the end result is a lot fewer calls to the comparatively slow DataThese are my benchmark results, testing the existing "Old" implementation vs my proposed "New" implementation for int arrays of length 10, 100, and 1000
The new versions are all about 1.5x as fast as their equivalents. Configurationdotnet 8.0.100-preview.1.23115.2 Breaking ChangesThis change would not alter the API in any way but would be a value breaking change - the results of shuffling with the same seed would be different (but still random!) and the prng would almost always be advanced fewer steps. For this reason, if this is considered worth doing, I suggest doing it sooner rather than later to avoid annoying people who are reliant on value stability. Pull RequestI'm happy to tidy up my code and make a pull request for this if it's considered worthwhile.
|
Presumably, it's fine to change the algorithm when seed is not specified by user, e.g. |
This has overlap with #82286; both are effectively looking at reducing the number of random numbers generated as part of an operation that may need multiple generated. You might coordinate with @sakno on a single approach to try for both GetItems and Shuffle. I am curious, though, regarding the approach outlined here, whether it would preserve the quality / properties of the random numbers being generated. We don't, for example, just use % as part of Next(int), because it skews the distribution inappropriately (in addition to the cost involved), and will instead use retries or more recently the fastrange algorithm. |
|
Thanks for the suggestion.
This approach is completely unbiased (assuming the underlying prng is). |
I must be misunderstanding your suggestion then. Please feel free to put up a PR with your suggested change and we can collectively evaluate it with the exact code in front of us. |
I am working on this. I've just had a few personal issues come up in the last week... |
Tried and decided against in #83305 |
Description
Hi. This is not a performance problem per se but a potential for quite a big improvement.
I noticed that a
Shuffle()
method has been added to theRandom
class.I recently did some performance optimizations in the rust
rand
crate (library) which made it run about 1.5x times faster . I've had a little go and gotten similar results in C# (see benchmarks below).How it works
Shuffling an array of length
n
essentially involves makingn
random swaps between elements.The way I've made it faster is to, instead of generating a new 32 bit random number for each swap, group the swaps together into the largest groups possible and only generate one random number per group. (I'm simplifying - counting calls to
Random.Next(n)
here which themselves may call_prng.Sample()
more than once)For an array of length 12, instead of generating 12 different random numbers, you generate one random number in the range [0,479001600) and use div and mod methods to work out which swaps to do. For longer arrays you use more groups and they get a bit smaller but the end result is a lot fewer calls to the comparatively slow
Next(n)
function.Data
These are my benchmark results, testing the existing "Old" implementation vs my proposed "New" implementation for int arrays of length 10, 100, and 1000
The new versions are all about 1.5x as fast as their equivalents.
Configuration
dotnet 8.0.100-preview.1.23115.2
Windows 11 Pro 22000.1574
x64
12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz
Breaking Changes
This change would not alter the API in any way but would be a value breaking change - the results of shuffling with the same seed would be different (but still random!) and the prng would almost always be advanced fewer steps. For this reason, if this is considered worth doing, I suggest doing it sooner rather than later to avoid annoying people who are reliant on value stability.
Pull Request
I'm happy to tidy up my code and make a pull request for this if it's considered worthwhile.
The text was updated successfully, but these errors were encountered: