-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[p5.js 2.0 RFC Proposal]: Batching for the rendering of objects. #6805
Comments
Can you elaborate a bit on what exactly gets batched? In your other proposal you mentioned begin/endShape, would this mean combining adjacent draw calls into one shape if possible? Some follow-ups:
|
For instance, in the canvas API
This is for both 2D and WebGL, since both support batching and aren't done in p5.js
For the Canvas API, probably not for the colors and materials part, but for WebGL maybe. Otherwise for things like just drawing different shapes, yes those would be batched. |
I think when evaluating this, we'd probably be weighing it against maybe less automatic solutions, like documenting manual batching using existing p5 tools as a performance optimization in the new set of tutorials we're currently writing for the in-progress new p5js.org site. Possibly also expanding the use of So two questions to be answered, I think, are (1) what the added engineering complexity of batching would be, and (2) can we handle enough cases to make that complexity worth it. For (1), I gather we'd have to store info about the draw call and avoid actually drawing it until we hit a new draw call that can't be grouped, or if we reach the end of I think it's definitely worth surfacing manual batching in our docs and tutorials, and if investigation shows that we can get some good across-the-board improvements without needing a big overhaul of the rendering pipeline, that would be great too. |
It's definitely worth it to flatten it to all go into one buffer. We're already calculating the vertices so compressing them all into one shape shouldn't pose an issue especially since the transforms like 3d to 2d projection should for the most part still be done on the GPU for WebGL and doing the same thing, but using paths and moving around
It depends on what you consider large, but I think that the overhaul should be short of that and should be an above medium pipeline change. My only issue with batching is how we will detect drawing multiple shapes with strokes that touch one another with the Canvas API since it'll affect the strokes if we don't separate those draw calls.
|
This isn't true for blending transparency: some blend modes like ADD are commutative, but the default BLEND mode isn't. Also, when things are drawn at the same Z value, we use the painter's algorithm, so things the user asks to be drawn last end up on top. If everything ends up in one draw call, and everything is at the same z, it's unclear that we're able to preserve this property.
If we want everything to be one buffer that we draw as one call, in WebGL, we need to deal with transforms. Currently, they're a uniform, so they don't change within a batch of vertices being drawn. I think we need to either:
(1) can be very costly. In this demo, I draw a bunch of bouncing balls, drawn as separate shapes, and drawn by flattening the transforms into one big vertex array using (2) is also costly as it sends a lot more data to the GPU by duplicating a transformation matrix per vertex. We did some similar tests when trying to optimize line rendering with instancing. On some computers, the duplication was fast and worth it overall, but on others, it made things much slower. Also, I don't think we can use WebGL stride and such to avoid the duplication because the shapes might not have the same number of vertices. (3) would work around this issue, but puts some big limits on the usefulness of batching, since |
If the rendering depends on the values of the pixels being rendered over and has to be done in order, we can avoid batching those, although other types of blending would still work with batching. Otherwise, we can rely on the depth value in order to calculate what to do.
The main transformations can be easily done on the CPU since they don't rely on complex operations which means that they can still be batched in order to shorten the amount of draw calls. However, when dealing with more complex operations like 3d rotations, especially ones utilizing matrices (like the example you gave), which could better utilize the GPU's resources, we can just batch the ones with say, the same rotation, still reducing the number of draw calls while making sure that the GPU still does the transform operations, which should still provide better performance, or at least a performance deficit that's nearly negligible in the worst case scenario. |
I think it's worth trying, although I'm anticipating there being a point at which the batching process itself starts to have more overhead than the separate batch calls + GPU transforms would, and I'd like to keep it from introducing slowness in those cases. Especially as there are many shapes, collision detection becomes more and more of a problem (games use spatial partitioning algorithms to help with that, but even coming up with partitions involves going over all the points in the shape, and some shapes can get big.) Setting a reasonable limit on the size of a batch could maybe help with that. I think it's worth considering that it might be an easier and more effective change to let users do batching themselves:
|
Because we're only talking about doing the main and simple transformations on the CPU, while more complex transformations that rely on things like matrix multiplications are done on the GPU, with the different GPU transformations having their own batches, it'd require an extreme and unusual condition like hundreds of translate and rotation statements getting called in an alternating order for there to be a significant enough deficit in performance to make the batching approach less performant compared to the non-batching one, which is unlikely in at least 99% of the real-world usage of p5.js.
Even with collision detection, batching should still provide more than enough of a significant advantage to performance. We'll want to set a limit on the size of a batch, but it'll be massive and something that a lot of sketches won't reach, as realistically, if the detection is well optimized with say, quad-trees for collision, the time spent trying to detect collisions would still be less than the cost of separating everything into more draw calls. Plus, collision detection will only have to be done on shapes with strokes and that use the CanvasAPI, with the cost in performance being nearly insignificant unless the shapes are always in the same sector of the quadtree and require a more expensive check or just forgoing it and rendering the batch before moving on to another. All in all, collision detection is only needed for certain shapes, only has significant cost for some of them in certain conditions, and still provides a performance increase compared to the alternative, making it absolutely still worth it in this case. Although, we likely won't be doing collision detection for some of the custom shapes created with things like
While letting the user batch everything themselves would work for users that really want to optimize their performance, it'd be unnecessarily confusing for them and be an unnecessary loss in performance in nearly all cases for anyone not using these features from not batching like many other rendering libraries do. I think that, while manual batching options should be offered (especially better ones than just |
I'm not sure I'm following why it only applies to those. since the last drawn shape at the same z value should go on top, shouldn't all shapes need to know if they'll collide with something else in the batch?
If we have more than one transform in the batch, I think we have to multiply all points by the current matrix (possibly using a simpler method like you described to avoid actually doing a matrix multiply), but in that case, it mostly only matters how many points we multiply. (I suppose each time the matrix changes, we have to check if it's a simple matrix or not, and maybe that cost could add up? But I'm less concerned about that than about vertex counts.)
Here's a demo that doesn't do matrix transforms, it just handles translation: https://editor.p5js.org/davepagurek/sketches/lPfCkf4vk You can play around with how many points per particle there are, and how many particles, to see how it affects the frame rate. For small numbers, batching is fine, but it starts to lag as it gets higher. On my computer, these all are handled fine using the current rendering method. A high number of points per particle isn't unusual, especially if shapes are drawn with curves. This is also with no collision detection yet. Again, I think this is still worth trying, but it's not a clear cut across-the-board gain. I think when determining when to flush a batch, we'd need to consider:
|
Also, because I think this wouldn't be a breaking change, this doesn't have the same somewhat tight deadline that other 2.0 proposals have (we want to put breaking api changes into the 2.0 release to comply with semantic versioning), so although I feel that this needs R&D and isn't as straightforward as other proposals, I'm down to help guide any of that R&D if contributors are interested in trying it, and we're free to finish it in a 2.x release if we need to. |
For both the canvas API and WebGL, no (I'm assuming that you don't mean an actual separate "z value" for the CanvasAPI and are including it in this comment). An important piece of information is that batches should share certain attributes including fill color (we're assuming that color's opaque since we likely won't batch transparent objects). We can deal with batching on the Canvas API by creating a batch buffer, adding shapes to the batch as the code executes, and when once one of the attributes change, rendering the batch before going on to create another one in the same fashion. This ensures that everything that depends on the z (aka order of rendering for the Canvas API) due a change in attribute gets rendered correctly while still allowing for batching of shapes, whose shared attributes allows for batched rendering. I'd recommend making WebGL deal with it in about the same way, but we could also just add a z (fragment depth) value into the buffer so that things are rendered the same even if it's out of order if you think that'd be better. The only major issue after that, and the reason why I'm talking about collision detection for shapes with strokes is because of this: Both of these were rendered with the Canvas API in this example: As you can see, by batching shapes within a single path, any shapes that have strokes get their strokes combined due to them sharing a path, which is why we need to make sure that we don't end up batching shapes like this and accidentally render something like the
I was thinking about avoiding using matrices for transforms where they aren't necessary automatically instead of checking whether it's simple or not since it's often more efficient to not use them for transformations. However, as a side note, I do think that rotations with values of:
That example uses a translation matrix which wasn't really what I had intended. Here's an example that doesn't utilize a transformation matrix for the transform operation https://editor.p5js.org/PotatoBoy/sketches/XFOO0ynfG and here's the results with the example you sent (they're also in the comments below the sketch's code):
As you can see, with simpler transformations, like the single
(I added numbering so that it's more easy to see what's for what)
|
While I do agree that this can be paused (p5.js has been fine without batching for quite a while after all), I'd still like to note that this will still require major pipeline changes in order to facilitate this and that 2.0 is prioritizing performance for its new choices like math libraries, which a rendering optimization like the addition of batching would fit perfectly into. |
Makes sense, I was still thinking of batching color in my comment.
So I'm mostly looking at this from a project management perspective. Like you said, there are a number of major changes required to make all this work, and so far it's all being discussed as one change. The danger is that, because this proposal doesn't exist in a vacuum, it has to compete for time resources with other proposals, and also involves modifying the same code as other proposals. To put this on as good footing as possible, I think it's best to divide this up into different chunks that provide incremental benefit on their own so that we aren't having to choose between doing all of this or doing none of it, and provide a clear path so that we can still easily continue down this road even if we aren't able to get to all of it in 2.0. Here's an example of what I'm thinking. Each of these is a fairly large chunk of engineering, also ensuring it's well tested and correct, but still providing a tangible benefit:
How do you feel about this structure and ordering? |
All of this sounds pretty good, although I do think that this shouldn't just be an optional included module since it's meant to impact the average sketch, not just performance conscious coders who already know how to properly optimize their code. |
Increasing access
This would increase the speed of projects.
Which types of changes would be made?
Most appropriate sub-area of p5.js?
What's the problem?
Currently projects with multiple shapes drawn can be very slow. This is largely due to draw calls.
What's the solution?
This can be partially fixed with instancing like seen here: #6276 and in issue #6275, but another effective strategy, especially for those who don't want to deal with more complex things like combining meshes manually or instancing would be to batch the rendering of objects together so that they may be able to be drawn within less drawn calls. This would however be quite the undertaking and require a decently substantial change to p5.js, so if it's not possible or worth it to implement this, I'd understand that too.
Pros (updated based on community comments)
Cons (updated based on community comments)
Proposal status
Under review
Note: The original non-RFC 2.0 issue is #6279
The text was updated successfully, but these errors were encountered: