-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make rendering graph truly dynamic #240
Comments
This project is a good reference. https://github.com/pumexx/pumex |
I didn't read it thoroughly, but it looks more like old graph implementation (except windowing problems with it). Could be wrong though. |
As far as I understand it, the actual scheduling of the "render group" closures into passes and subpasses would happen after the construction stage has happened. At this point, the graph should have all the information from the various However, with the proposed interface, I don't think the scheduler actually has enough information to decide what can go into subpasses, and what needs to be bumped into a separate pass, right? The graph would also need a hint about whether only the same pixel is accessed in the buffer, because that is a requirement of a subpass. I suppose just a separate parameter to the |
I guess my above answer presumes the new dynamic graph would want to support multiple subpasses, which as I understand it are not supported currently. I don't really see any reason to omit subpass support when things are changed around anyway. |
The proposed approach of using optimizations on the graph after construction for merging nodes into subpasses seems like a bit of a fragile approach to me. That approach, at least as presented, makes the index of the different attachments an implementation detail of the node. This means that in order to assure different nodes are merged into the same subpass/different subpasses within a pass, you would have to consider that implementation detail of the nodes. I have a couple of ideas:
Any thoughts? @omni-viral @Frizi |
I am still getting used to the concepts in Vulkan, and it seems I had a bit of a misunderstanding. A lot of what I wrote above was based on not knowing that the indices used in the shaders actually reference bindings made when creating the subpass, and not when creating the pass as a whole. |
Current design of rendering graph assumes that graph is rarely, if ever, rebuilt. The scheduling process is really designed as one-off thing. The lack of dynamism manifests by things like resizing the window require a full rebuild. Creating resources like depth map images on demand is also very hard. The only way to accomplish it without rebuilds is through sharing via Aux type and manual synchronization, but this basically circumvents any usefulnes of graph in the first place. On top of that we don't really support subpasses and have severe problems with oversynchronization. There are also some challanges with the implementation, like rendy-chain being quite far out from what graph is doing, or rendering directly to surface image requiring a complex separate code path in every render node that wants to support it.
To solve those problems, the internal graph scheduler and render node API must be reworked to assume that things are dynamic. Additional goal is to be able to serialize the graph setup and hot-reload it on the fly.
Proposed high level design
The rendering graph lifecycle would effectively be split into three phases:
Graph building
A big difference between existing and proposed design is that rendering nodes are themselves declaring the resources that graph should create for them. The nodes can also produce and accept parameters which can contain arbitrary data types, including just a resource id. All dependencies between nodes are automatically infered based on read and written data.
A simple example of this would be a
PresentNode
declaring the output image resource, which then potentially multiple rendering nodes would be able to accept as their render target.The code above runs basically only once, or extremally infrequently. The builder can be swapped out, but there is rarely need for this. Most of the dynamism can be accomplished by logic of render nodes at construction phase.
Graph construction and evaluation
Every node declares the resource it will use during the task it performs. The usage can be as simple as saying "i will use the image given to me as a color attachment on slot 0". More complex nodes can also declared arbitrary
use_image
oruse_buffer
, and later reference those resources. This happens every frame and has access toaux
data.Later the node returns the outputs it declared (in this case
()
) and the actual rendering job it actually performs (the closure is what is later used in execution phase).The code here is essentially equivalent to existing render groups. There are also other
NodeExecution
types that are more suited for other cases like presenting, image transfers or compute nodes.All resources used exclusively by this node are still managed by it. The node state itself is persistent (up until the
GraphBuilder
as a whole isn't replaced). If there is a descriptor set needed, the node is able to create it in build phase and use during evaluation phase.The difference is that the resources taken from graph (like
image_object
in example above) cannot be assumed to be the same across frames. They can MOSTLY be the same between the same "frame_in_flight", but this can change at arbitrary time due to logic happening in other nodes. Rendy can provide a set of utility types/functions to make it easier to deal with in common cases.Once all nodes have finished their construction phase and declared all resources, the graph can be optimized and scheduled. The optimization/reduction passes can take care of things like merging multiple declared
NodeExecution::pass
instances into single subpass, and then multiple subpasses into single render pass with the right subpass dependencies. This also allows for easy derivation of optimalLoadOp
andStoreOp
settings and putting barriers only where it's needed. After the internal single-frame graph is fully reduced, the executions are scheduled for parallel execution.The text was updated successfully, but these errors were encountered: