-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10-15% Speedup by enqueuing more at a time #231
Comments
Benchmark:
After (enqueue 100 at a time) - 1000 x 10 runs
Notice the current step is the same (1 million total steps) but the MLUPs / Bandwidth / Steps-per-sec are all 100x lower than they really are:
This implies enqueueing these commands creates (in this toy example which takes nearly no GPU time) a 250% speedup. |
If I fix the stats, and then test on a more strenuous benchmark (e.g. the default 256x256x256) the benefit goes away. That makes sense because the benefit should effect smaller simulations that need to synchronize too often compared to the work:
This MLUPs matches other benchmarks for the 3060, so the benefit here likely matters only for smaller sims that suffer from unnecessary CPU overhead. |
FluidX3D/src/lbm.cpp
Line 851 in 584f10a
Tested on 2D Taylor Green Vortex
By default, I get something around 2400-2500 Steps per Second. I'll use 2490 as my starting FPS.
I added the following simple modification.
This enqueues 4 steps at a time, before doing a blocking synchronization step.
On my PC, this now will show me as having
692 Steps/s
, which multiplied by 4, is 2768 (since the machine is confused due to the domain running 4x steps when the output is only expected 1).2768/2490 is just about 11% speedup.
You can enqueue more at a time, say 100 steps per iteration.
Now the output says it's
29 Steps/s
implying it's running at a slightly faster 2900 FPS. (16% speedup). The downside however is now you're probably only rendering just under 30 FPS (at 100 *29 steps per second) instead of 60 FPS.Probably an ideal solution would be to dynamically change the number of steps enqueued whenever the FPS is above 60.
Edit: Make sure you remove the
lbm_domain[d]->increment_time_step();
that's called after synchronization to keep the timestep count correct.The text was updated successfully, but these errors were encountered: