-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf improvements: generate_iso_surface_vertices and generate_sparse_density_map #6
Comments
I'll come back to this in the next days, I have to get back into the code. For Even an alternative with less parallel overhead might not be enough, due to the cash unfriendliness of this approach. If we cannot find a more or less drop-in replacement for the hash map with satisfying performance, I think the only true alternative is a completely different approach of implementing marching cubes that doesn't use a map at all. |
Re density map, interesting! Yea so I have actually switched to
You could also look into CHashMap? though i don't know how maintained it is anymore? I will look into seeing if reusing the hashmap can help as well (similar to vertices, indices), though its a bit more complicated since the DensityMap is an enum |
Ah, well I didn't consider your number of particles in my first reply 😅 I mostly tested the parallel stuff with 100k to 1 million particles. I think for a few thousand particles you have too much overhead with the worker pool of rayon, the locks in the hashmap etc. So it's not surprising that the sequential version is faster. I think dropping the maps by using a different marching cubes strategy would help a lot as you could really need the cache efficiency here. |
Sounds like a plan :) let me know once you've got some thoughts on how to proceed, or if you need help, etc. :) |
I made some improvements to the current reconstruction approach that should increase performance with a lot of threads. The number of particles might still be much too small in your case, but you could try it again. |
sorry for the delay! so just rebased, and seeing these numbers for about 500 particles:
looks like the density map still accounts for large perf overhead; your changes only affected threaded version yes? For comparison, here is inout patch not rebased (above is inout patch rebased):
|
So I noticed the inline(never) annotation everywhere, as mentioned in the inout branch. Is the rationale there for the profiling infra to have more accurate reporting? If yes, perhaps we should cfg the inline annotations along with the profile feature? |
Hello! So as apart of investigation into potentially improving perf, I've collected some stats, and I've identified two target areas that appear to occupy largest portion of meshing budget:
So for meshing every frame the 1k particles, it takes from 30-50ms; Ideally we can get this down somewhere close to 16ms, so that we could have a one-frame latency delay on generating the meshes for a realtime sim in 60fps.
As such, it looks like
generate_iso_surface_vertices
(15.7ms) andparallel_generate_sparse_density_map
(17.9ms) are good candidates.I don't know much about fluid simulations, so I'll defer to you on matters here, but I have done a lot of work in perf and optimization; do you think there's any place to attack here, and if so, mind giving me a pointer so I could start/take a look? :)
I'm also wondering perhaps is there any data structures we don't have to compute every frame? Perhaps the density map? Or similar to #4 we could perhaps reuse container structures to reduce allocation strain?
Thanks, and looking forward to your insights here :)
The text was updated successfully, but these errors were encountered: