Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly increase draw performance for large numbers of interleaved draw commands #14

Open
wants to merge 1 commit into
base: Main
Choose a base branch
from

Conversation

Wyverex42
Copy link

In our project we draw a significant number of items on a map view using ImGui. Each item consists of several sub-items which in many cases means that each item is split up into several ImGui draw calls (because the TextureID changes).

The problem is now that SImGuiOverlay copies the entire vertex array for each single draw command. With vertex arrays having thousands of entries, this can quickly become a major bottleneck due to constant memory (re-)allocations.

I didn't find a good way to avoid copying the entire vertex array, but at least we can merge draw calls using the same parameters by merging the index arrays. The worst case I tested with previously took 250 ms per tick(!) to draw everything. With these changes, it now takes 3 ms. This is good enough for us for now, but it's likely that we have to spend some more time there in the future to scale up further.

The main points are:

  • Merge draw calls as much as possible to avoid copying the entire vertex array more than we need to
  • Avoid calling FSlateVertex::Make() and instead inline the required setup which saves additional time
  • I also opened a PR to avoid copying the vertex and index arrays when calling MakeCustomVerts: https://github.com/EpicGames/UnrealEngine/pull/12023, which also saves a few hundred microseconds with large arrays. However, that repo being what it is, I don't expect this to be merged anytime soon, if ever
  • Having UsedCommands as a member seems unnecessary at first, but reallocating this all the time in OnPaint was actually measurable, so made it a member

This works perfectly for us. Maybe you don't want to take it as is, but I'd love if we could find an agreement here so that we don't have to maintain a divergence in our project

@VesCodes
Copy link
Owner

VesCodes commented Jul 27, 2024

Hey @Wyverex42, I really appreciate this!

Integrated the suggested optimisations to the vertex buffer setup loop in fa23960 and 53c820c. I'd like to make this even more trivial if we can get away with not carrying out the position transform there; looks like it should be possible apply the transform to the entire draw submission via FSlateDrawElement::RenderTransform but annoyingly some of the setup code called in MakeCustomVerts is private - being able to unroll that would also save that extra copy covered by the linked PR.

Would merging draw commands not introduce potential draw ordering problems? ImGui seems to internally have some machanism to merge compatible sequential draw calls (sidenote: that memcmp header check might be preferred here) but on first read it feels to me like the order of command submissions might be important in some scenarios. Perhaps @ocornut could weigh in here!

@ocornut
Copy link

ocornut commented Jul 27, 2024

You cannot reorder or merge non-sequential draw commands as you may loose some z-ordering and its likely to lead to incorrect visuals in some situations.

I don’t understand the underlying problem but I am pretty sure if you read the vertex number for the first and last triangle of a draw command you may be able to infer a vertex range in case your setup needs not upload a buffer per command.

@Wyverex42
Copy link
Author

Sorry for the late reply, I lost track of this.

I don't know enough about the internals of ImGui to make a call here but I obviously defer to Omar on this. For our particular usage this fix works perfectly, probably because these are all, in fact, sequential draw calls. They only differ in their TextureId and merging those with the same Id works fine for us. But this is probably because the nature of our draw calls already ensures that calls with a different TextureId don't interfere with each other in terms of z-order.

So this solution might not be generic enough to merge and we'll have to keep this divergence. However, some form of optimization in SImGuiOverlay seems warranted here in any case. As outlined in the OP, depending on what you do, you can easily push this to take 250ms per paint.

The main issue here is that Slate always wants a fresh array of vertices and indices so you have to allocate them each tick. The more draw calls, the more allocations, and you quickly get into a "death by a thousand cuts" situation.

@ocornut
Copy link

ocornut commented Aug 19, 2024

Note that the Metrics window allows you to visualize every draw call down to the triangle, so it may help understanding the data if useful.

We could probably implement some heuristic to cover the use calls of “drawing many textures, with a small amount of other primitives in between”., which could allow some reordering if no overlap and if the cpu tradeoff is deemed worthy..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants