-
Notifications
You must be signed in to change notification settings - Fork 0
How initializing the frame graph works
The process of frame graph initialization is based on the builder pattern (the same as the scene initialization). For each frame graph class, there's a corresponding builder class. I.e. there's ModernFrameGraphBuilder
for ModernFrameGraph
, D3D12::FrameGraphBuilder
for D3D12::FrameGraph
, and Vulkan::FrameGraphBuilder
for Vulkan::FrameGraph
.
The builders encapsulate all call-once initialization code. To allow the builders to modify their internal states, the frame graph classes declare the builders as friend
s.
To initialize a frame graph, the builder needs a declarative description. The description stores the list of passes and resources that the frame graph should use. It doesn't contain any logic code.
The frame graph description is defined in FrameGraphDescription.hpp.
The description stores the list of render passes making up the frame graph. Each pass has a user-defined name. We store the passes in a hash table that assigns the pass type (G-Buffer/Blur/Copy/etc.) to each pass name:
std::unordered_map<RenderPassName, RenderPassType> mRenderPassTypes;
The description also stores the resources used by the passes. All passes define a unique string id for each resource they use. For each resource usage in each pass, we store a triplet: (pass name, per-pass resource id, user-defined resource name):
struct SubresourceNamingInfo
{
RenderPassName PassName;
SubresourceId PassSubresourceId;
ResourceName PassSubresourceName;
};
From now on the term subresource will refer to a single per-pass usage of a resource.
We store all these triplets in a single list:
std::vector<SubresourceNamingInfo> mSubresourceNames;
The backbuffer resource is special, and we store its name separately:
ResourceName mBackbufferName;
We say that two passes are connected if any input resource of one pass has the same user-defined name as any output resource of the other pass.
A description of a frame graph rendering the scene into a G-buffer and copying the result onto another image, would define:
— A G-buffer pass and a copy pass;
— The same resource name for GBufferPass::ColorBufferImage
and CopyImagePass::SrcImage
;
— A different resource name for CopyImagePass::DstImage
.
Each render pass declares a set of static
constants:
— PassType
is the type of the pass (G-Buffer, Copy, Blur, Depth Prepass, etc.). This parameter is unique for each pass;
— PassClass
is the minimal queue type for the pass (Graphics/Compute/Copy);
— PassSubresourceId
is a enum assigning a unique per-pass id to each subresource in the pass;
— ReadSubresourceIds
is the list of all input subresource ids in the pass;
— WriteSubresourceIds
is the list of all output subresource ids in the pass.
All subresource ids in all passes have a unique string representation. Each pass has a function returning the string representation of the given subresource id:
//Defined in each pass
static inline constexpr std::string_view GetSubresourceStringId(PassSubresourceId subresourceId);
Using C++ template magic, we can use the pass type to obtain other information about the pass. This template magic is defined in RenderPassDispatchFuncs.hpp. For each of these parameters, there's a corresponding template function and a non-template function. The non-template function chooses the correct template one with a switch statement on the pass type.
For the given pass type, we can obtain:
— The pass class, using GetPassClass()
function;
— The total number of subresources in the pass, using GetPassSubresourceCount()
function;
— The number of input and output subresources, using GetPassReadSubresourceCount()
and GetPassWriteSubresourceCount()
functions;
— The per-pass ids of input and output subresources, using FillPassReadSubresourceIds()
and FillPassWriteSubresourceIds()
functions;
— The string representation of the given integer subresource id, using GetPassSubresourceStringId()
function.
The class ModernFrameGraphBuilder
is defined in ModernFrameGraphBuilder.hpp. It contains the main bulk of code for initializing the frame graph. The class contains the Build()
function for building the frame graph, helper private functions, and virtual
functions for API-specific code.
The class stores the reference to the frame graph being initialized:
ModernFrameGraph* mGraphToBuild;
The core term in the frame graph initialization is a subresource. The subresource is a single usage of a resource in a single pass. Because each resource may be accessed by multiple passes, there are multiple subresources per resource. Likewise, because each pass may access multiple resources, there are multiple subresources per pass.
In some cases, we need to iterate over all subresources of a resource. In other cases we need to iterate over all subresources of a pass. To efficiently handle both scenarios, we store the subresources in a special data structure.
The idea is to put all subresources in a single list. Each pass gets a range of subresources in this list. Each subresource stores the index of the previous and the next subresource of the same resource.
In other words, all subresources of each pass form a subarray, and all subresources of each resource form a linked sublist. This linked sublist is circular for each resource, meaning its first node and the last node are connected.
The struct SubresourceMetadataNode
defines a subresource record. Aside from the previous and next subresource indices, each record stores the resource id and the queue type of its pass:
//Describes a particular pass usage of the resource
struct SubresourceMetadataNode
{
uint32_t PrevPassNodeIndex; //The index of the same resources metadata used in the previous pass
uint32_t NextPassNodeIndex; //The index of the same resources metadata used in the next pass
uint32_t ResourceMetadataIndex; //The index of ResourceMetadata associated with the subresource
RenderPassClass PassClass; //The pass class (Graphics/Compute/Copy) that uses the node
};
In debug builds, each subresource also stores the names of its pass and resource.
All subresource records are stored in a single list:
std::vector<SubresourceMetadataNode> mSubresourceMetadataNodesFlat;
The struct ResourceMetadata
defines a single resource. It contains the resource name, the enum defining the ownership, and the head index of the subresource linked list:
//Describes a single resource
struct ResourceMetadata
{
ResourceName Name; //The name of the resource
TextureSourceType SourceType; //Defines ownership of the texture (frame graph, swapchain, etc)
uint32_t HeadNodeIndex; //The index of the head SubresourceMetadataNode
};
The resource records are stored in a single list:
std::vector<ResourceMetadata> mResourceMetadatas;
The ownership field is used for backbuffer resources, to mark them as owned by the swapchain. It affects who is responsible for creating and deleting the resource, and also the resource management in multi-frame scenarios.
The struct PassMetadata
defines a render pass record. It contains the pass name, queue type, pass type id, dependency level, and subresource span:
//Describes a single render pass
struct PassMetadata
{
RenderPassName Name; //The name of the pass
RenderPassClass Class; //The class (graphics/compute/copy) of the pass
RenderPassType Type; //The type of the pass
uint32_t DependencyLevel; //The dependendency level of the pass
Span<uint32_t> SubresourceMetadataSpan; //The indices of pass subresource metadatas
};
All render pass records are also stored in a single list:
std::vector<PassMetadata> mTotalPassMetadatas;
In a lot of cases, we handle present passes differently from other passes. Because of that, we split the pass list into two ranges:
Span<uint32_t> mRenderPassMetadataSpan;
Span<uint32_t> mPresentPassMetadataSpan;
To iterate over all subresources in a pass, the programmer iterates over the pass subresource span:
for(uint32_t index = pass.SubresourceMetadataSpan.Begin; index < pass.SubresourceMetadataSpan.End; index++)
{
SubresourceMetadata& subresourceMetadata = mSubresourceMetadataNodesFlat[index];
//Do subresource-specific actions...
}
To iterate over all subresources in a resource, the programmer iterates over the linked sublist, starting from the head node:
uint32_t index = resource.HeadNodeIndex;
do
{
SubresourceMetadata& subresourceMetadata = mSubresourceMetadataNodesFlat[index];
//Do subresource-specific actions...
index = subresourceMetadata.NextNodeIndex;
} while(index != resource.HeadNodeIndex);
When a pass swaps its subresources from frame to frame (ping-ponging), we need to create a separate pass object for each unique per-frame combination of the pass subresources.
Moreover, we also need to duplicate the resources accessed by the pass. To handle this, the builder stores a separate PassMetadata
record for each unique frame of a single pass, and a separate ResourceMetadata
record for each frame of a single resource.
There are two possible swap behaviors for multi-frame passes, depending on the nature of the pass subresources.
If the pass only uses non-backbuffer resources, we choose the pass frame based on the cumulative frame index.
If the pass uses backbuffer images, we choose the pass frame based on the backbuffer index. This index depends on the present mode.
To determine the swap behavior, we check the resources accessed by the pass. Each resource record stores a SourceType
value which describes if it's a regular resource or a backbuffer.
Suppose A is a multi-frame pass with 3 frames and 1 output subresource. B is a single-frame pass with 1 input resource. Pass A writes to resource R and pass B reads from resource R.
To separate the resources handled by pass B, we duplicate pass B three times.
A similar logic would apply in the opposite case, where pass A has 1 frame and pass B has 3 frames. We solve 1 -> N and N -> 1 pass connections by duplicating the passes.
Now, suppose pass A has 3 frames, and pass B has 2 frames. Unfortunately, we cannot connect them via a single resource. I wasn't able to think of a way to solve M -> N connections. The class ModernFrameGraphBuilder
explicitly forbids it, throwing an error if it detects a connection between multi-frame passes with different number of frames.
Suppose a single-frame resource is used in a multi-frame pass. For example, single-frame pass A writes to resource G, and then multi-frame pass B reads from this resource.
The linked list of the resource G's subresources interleaves between the single-frame usage in A and per-frame usages in B:
... <-> G as A's output <-> G as B#1's input <-> G as A's output <-> G as B#2's input <-> ...
Now there's a problem. The first A's output in this linked list is connected to B#1's input. The second output is connected to B#2's input. Because we store the indices of the previous/next nodes in the subresource record, we have two different records for the A's output. But pass A only has one subresource, so what do we do?
To solve this problem, we introduce helper subresources. Their only purpose is to have unique nodes for the single-frame subresources used in multi-frame passes. To avoid the confusion, we'll call non-helper subresources primary subresources.
With helper subresources, we only use the primary subresource of A's output for the first frame, and helper subresources for other frames:
... <-> G as A's output <-> G as B#1's input <-> G as helper#1 A's output <-> G as B#2's input <-> ...
For each primary subresource, we store a range of its helper subresources. Each primary subresource has only one such range. We also store the range of all primary subresources in the subresource record list to separate them from helper subresources:
//The sole purpose of helper subresource spans for each subresources is to connect PrevPassNodeIndex and NextPassNodeIndex in multi-frame scenarios
Span<uint32_t> mPrimarySubresourceNodeSpan;
std::vector<Span<uint32_t>> mHelperNodeSpansPerPassSubresource;
The function ModernFrameGraphBuilder::Build()
takes the frame graph description and initializes the frame graph. The process consists of nine major steps:
- Registering passes and resources provided with the description;
- Sorting the passes in the order of traversion;
- Initializing the additional pass data;
- Validating the subresource linked lists;
- Amplifying multi-frame pass and resource objects;
- Initializing the additional resource data;
- Creating the resource objects;
- Creating the render pass objects;
- Creating the inbetween-pass barriers.
The function ModernFrameGraphBuilder::RegisterPasses()
initializes the lists of passes, resources, and subresources, using the data provided with the frame graph description. The logic is split into two functions, InitPassList()
for passes and InitSubresourceList()
for resources.
For each render pass in the description, the function InitPassList()
creates an entry in the pass record list. It also creates the entry for the present pass. For each created entry, it allocates an empty span in the subresource list.
For each unique resource name in the description, the function InitSubresourceList()
creates an entry in the resource record list. It also creates the entry for the backbuffer resource.
For each (subresource id, pass name, resource name) triplet in the description, the function finds the subresource record and the newly created resource record. It assigns the resource id to the subresource.
After finishing this step, the builder has the lists of pass, resource and subresource records. Each pass has its own subresource span, and each subresource knows its resource.
In this section, we'll say that pass B is adjacent to pass A if A writes to the resources read by B.
The frame graph traverses the passes in topological order, which means it traverses pass A before pass B if B is adjacent to A. To create this order, we need to topologically sort the passes. The algorithm described here is the same one as in the amazing article by Pavlo Muratov.
First, we build the adjacency list. For each given pass, the adjacency list stores the indices of all passes adjacent to it. We then use the adjacency list to visit all passes in depth-first order.
For each pass, we maintain two flags: "already visited" and "currently processed". We use the first one to only visit each pass once, and the second one to detect circular dependencies.
After creating these data structures, we iterate through passes and for each pass we perform the steps:
- If the pass is marked as both "already visited" and "currently processed", we have found a circular dependency. Signal a critical error.
- If the pass is only marked as "already visited", don't process it.
- Mark the pass as "already visited" + "currently processed".
- Take the adjacent passes from the adjacency list, and recursively perform the same steps for each of the adjacent passes.
- Unmark the pass as "currently processed".
- Add the pass to the list of sorted passes.
The bottom-level passes make it first to the sorted list, and the top-level passes make it last. After iterating through passes, we reverse the list, so top-level passes become the first and bottom-level passes become the last.
Sorting the passes topologically lets us assign a dependency level to each pass. These dependency levels further simplify the frame graph traversal and open the door for multi-threading.
The function ModernFrameGraphBuilder::SortPasses()
implements the topological sort and assigns a dependency level to each pass. It has five steps.
To create the adjacency list, we need to be able to determine if two passes are adjacent. To do this, we first build the set of input subresources and the set of output subresources for each pass. Then for each pair of passes, we check if these sets intersect.
We implement these sets with sorted lists of resource ids. The function BuildReadWriteSubresourceSpans()
creates these sorted lists for each pass.
The function iterates over all passes and asks each pass for per-pass ids of its input and output subresources, using FillPassReadSubresourceIds()
and FillPassWriteSubresourceIds()
. The function then checks the subresource record for each of the returned subresources, and reads the resource id. It stores the resource id in the corresponding per-pass list.
After iterating over all passes, the function sorts all created lists of resource ids.
Once we have the sets of input and output resources for each pass, we can quickly check if two passes are adjacent. To do this, we simultaneously iterate over input resources of one pass and output resources of the other pass. If they have at least one common element, the first pass is adjacent to the other pass. We implement this check in SpansIntersect()
function.
To build the adjacency list, we iterate over all possible pairs of passes. For each pair, we check if the input and output resource sets intersect, and if they do, we add the pair of passes to the adjacency list. The function BuildAdjacencyList()
implements this method.
The next step is the topological sort itself, implemented in the SortRenderPassesByTopology()
function.
First, we allocate the array for "already visited" and "currently processed" flags. We then iterate over all pass records, and for each pass we call TopologicalSortNode()
function. This function calculates the new sorted index for the pass, using the adjacency list and the flags array. We place this new index into the array of sorted pass indices.
Next, we map sorted pass indices to current pass indices, storing this map in another array. We'll need this information to calculate the dependency levels.
Finally, we reorder the pass record list according to the new sorted indices.
Now, when the list of render passes is in topological order, we can calculate the dependency level for each pass. We do it in AssignDependencyLevels()
function.
The algorithm assigns dependency level 0 to each pass. Then for each pass with dependency level D, it iterates over all passes adjacent to it. To each adjacent pass with dependency level A, it assigns new dependency level max(A, D + 1).
The adjacency list that we have stores the old pass indices, before the topological sort. To obtain the correct pass index on each iteration, we use the map from sorted pass indices to old pass indices that we created earlier.
The final step in pass sorting is sorting by dependency level. We need this to divide the list of render passes onto separate blocks with one block per dependency level. During frame graph traversal, we'll process each block in a separate thread.
We implement this step in the SortRenderPassesByDependency()
function. This step is just a call to std::stable_sort()
with the dependency level as the sort key.
After finishing this step, the builder knows the final order of render pass objects.
The function ModernFrameGraphBuilder::InitAugmentedPassData()
initializes various information both for further frame graph building and final frame graph traversal. It marks the passes to use dedicated compute and copy queues, and creates the blocks of render pass objects with the same dependency level.
Render pass records store the queue type in the PassMetadata::Class
member. The function AdjustPassClasses()
finds the optimal queue type for each pass and assigns it to each pass object.
Currently SolarTears doesn't support async compute and async copy, so AdjustPassClasses()
simply assigns RenderPassClass::Graphics
to each pass object.
Frame graph stores a separate span for each dependency level in the frame graph. The function BuildPassSpans()
creates these spans.
Since we've sorted the passes by the dependency level, this task is trivial. We iterate over all passes and compare the dependency level of each pass to the dependency level of the next pass. If they differ, we add the new span to the list.
After this step, we have most of the data for render pass objects initialized.
Each subresource record corresponds to a single resource usage in a single pass. All subresource records store the indices of two other records: one for the previous usage of its resource, and one for the next usage. The function ValidateSubresourceLinkedLists()
initializes the previous and the next subresource indices of all subresources.
Because we have sorted the passes in topological order, having pass A ordered before pass B means we traverse A before B. In turn, it means that for any resource accessed by both A and B, its usage in A should be ordered before its usage in B in its linked list of subresources.
We iterate over all pass records and check all resources accessed by each pass. For each resource, we keep track of its last usage. We check and update this "last usage" record on every iteration, rewriting the previous and next indices.
After iterating, the only thing left is to make each linked list circular. To do this, we need to find the first and the last usages of every resource. The first usages correspond to the subresource records without previous index initialized. The last usages are stored in "last usage" array that we just used. We traverse over the subresource records again to initialize the remaining connections.
After this step, we have a linked list of subresources for each resource, and can traverse it both ways.
Perhaps the most complex part of the build is resource and pass amplification. Amplification is splitting each multi-frame pass, resource and subresource into multiple per-frame copies, while preserving the relationship between the objects.
In this part, we'll use the term non-amplified to refer to the passes/resources/subresources before this step. We'll use the term amplified to refer to all per-frame copies of a pass/resource/subresource, and per-frame to refer to a single pass/resource/subresource copy created in this step.
The function ModernFrameGraphBuilder::AmplifyResourcesAndPasses()
implements the process of amplification. We break it onto three major steps:
- Amplifying resources;
- Amplifying render passes;
- Amplifying subresources.
The last step is what makes the function complicated. We've already initialized the links between subresource records in the previous step, and these links must be preserved in the amplified version. Moreover, in multi-frame contexts the links should be cross-resource: the last subresource of a per-frame resource should be connected to the first subresource of the next frame.
The function starts with preparing the resource remap list. For each non-amplified resource, this list stores the span of per-frame resource copies.
To create this list, we iterate over all passes and check all resources accessed by each pass. For each resource, we check the number of frames for the resource usage in the pass and update the span size for the resource. Since we can't connect an M-frame pass to an N-frame pass, we check that no resource has different non-1 frame counts in multiple passes.
Then we iterate over all spans in the remap list and fill each one with the required number of resource copies.
Next, we amplify the render passes. This step is trickier. Since passes can contain several multi-frame resources, we need to iterate over all of them and find the total frame count of the pass. Also, because frame graph distinguishes between swapping the passes linearly and swapping by the backbuffer index, we need to find the pass swap type.
The function FindFrameCountAndSwapType()
handles both tasks. It iterates over all resources referenced by the given pass and calculates the total frame count of the pass. The frame count is equal to the least common multiplier of frame counts of all resources.
The function also checks the source type of each resource to see if it's a backbuffer, and if so, sets the pass swap type to swap per backbuffer index. We use these values to initialize the entries in ModernFrameGraph::mFrameSpansPerRenderPass
.
The final and most complex part is amplifying the subresources. First, we allocate the memory for all new subresource records. We also create a list that stores the non-amplified pass index of each non-amplified subresource. We'll use this list to find the correct per-frame indices of the previous and next passes for each per-frame subresource.
We need to initialize three members of each subresource record: the previous subresource index, the next subresource index, and the resource id. To do it, we iterate over all amplified passes. For each amplified pass, we iterate over all per-frame copies. For each per-frame pass copy, we iterate over all of its subresources.
First we initialize the resouce id. To do it, we need to find the correct amplified resource span. Using the resource id of the non-amplified subresource, we obtain this span from the resource remap list. Then we calculate the resource id as <pass frame index> modulo <resource span size>
.
Finding the correct indices of the previous and the next subresources is tricky. The previous and the next pass for a subresource can either be in a different frame or in the same frame. The previous and the next pass can have different number of frames from the current pass. In some scenarios, we may even need to create helper subresources.
First thing to note: the amplification does not change the order of subresource nodes within a pass. All per-pass subresource ids in per-frame pass copies are the same as per-pass ids in the original non-amplified pass. This means that if we know per-pass subresource id, we only need to find the correct per-frame pass copy and take its corresponding subresource.
The functions CalculatePrevPassFrameIndex()
and CalculateNextPassFrameIndex()
serve this purpose for the previous and the next pass, respectively. Given the non-amplified index of the previous/next pass, the non-amplified index of the current pass, and the frame index of the current pass, they return the correct frame index of the previous/next pass.
Both functions return different results depending if the previous/next pass is within the same frame (i.e. the connection between subresources does not cross frame bounds), or if it's actually in the previous/next frame.
Because we have sorted the passes in topological order, we can use non-amplified pass indices to check if two passes belong to the same frame or different frames:
— If the previous pass index is LESS than the current pass index, they belong to the same frame.
— If the previous pass index is GREATER than the current pass index, they belong to different frames.
— If the next pass index is GREATER than the current pass index, they belong to the same frame.
— If the next pass index is LESS than the current pass index, they belong to different frames.
If two passes belong to the same frame, the current pass frame index is the same as the previous/next pass frame index. Otherwise, the frame index of previous/next pass is calculated with a formula:
<Previous pass frame index> = (<Current pass frame index> - 1) modulo <Common frame count>
<Next pass frame index> = (<Current pass frame index> + 1) modulo <Common frame count>
Where <Common frame count>
is the least common multiplier of frame counts of two passes.
Moving on. For the frame index we just obtained, there are two possible cases:
— There are enough frames in the previous/next pass (frame index < frame count);
— There are not enough frames in the previous/next pass (frame index >= frame count).
In the first case, we only need to take the corresponding per-frame copy of the pass, then its subresource span, and then the necessary subresource. That's it, that's our previous/next subresource record!
If the previous/next pass does not have enough frames, we need to create a helper subresource for every extra frame. The function AllocateHelperSubresourceSpan()
allocates the necessary number of helper subresources, using given subresource as a template. The helper subresource with the extra frame index is the per-frame copy of the previous/next subresource that we need.
After finding previous/next subresource, we connect it to the current one, initializing the previous and the next indices of two subresource records. If frame graph description is complete, no uninitialized indices should be left after amplification.
After this step, we have all per-frame passes, resources and subresources initialized.
The function ModernFrameGraphBuilder::InitAugmentedResourceData()
initializes various information for resources and subresources:
— Head subresource nodes for resources;
— API-specific data for subresources.
To be able to iterate over all subresources of a resource, we need some subresource to start from. For this purpose, each resource record stores a value called head node index. We initialize it in the InitializeHeadNodes()
function.
We iterate over all passes, and for each pass we iterate over all its subresources. For each subresource, we check its resource. If the head node index is uninitialized, we initialize it with the current subresource.
Because we've sorted the passes in topological order, each head node index refers to the earliest usage of the resource. We'll use this fact many times later.
The function InitMetadataPayloads()
initializes API-specific subresource information. The function has different implementations in D3D12 and Vulkan. In both cases, it iterates over all pass objects and creates an additional API-specific record for each subresource in the pass. We call this record subresource payload. The payload contains things like subresource format, pipeline usage flags, miscellaneous flags, etc.
API-specific passes define additional template functions. The implementations of InitMetadataPayloads()
use these functions to initialize subresource payloads.
Some passes in the previous step don't have enough information to completely initialize subresource payloads. For example, the copy pass doesn't know anything about the formats of the input and output resources, besides the fact the formats should be equal. Fortunately, we can propagate this data from other subresources.
Two types of propagation are possible. Vertical propagation happens between two subresources in a single resource. Horizontal propagation happens between two subresources in a single pass. The function PropagateSubresourcePayloadData()
propagates the data in both ways, trying over and over until any propagation stops happening.
For an example of vertical propagation, consider a resource accessed by two passes. The first pass specifies the format for the resource usage, but the second pass does not. We vertically propagate the format from the first subresource to the second one. The API-specific PropagateSubresourcePayloadDataVertically()
function implements the vertical propagation.
For an example of horizontal propagation, consider a copy pass. Suppose we know the format of the input subresource, but we don't know the format of the output subresource. Since we know the formats should be equal, we horizontally propagate the input subresource format to the output subresource. The API-specific PropagateSubresourcePayloadDataHorizontally()
function implements the horizontal propagation.
After this step, we have everything ready for the actual initialization of the real frame graph data.
The meaning of a resource we've been using up to this point is close to D3D12's ID3D12Resource*
and Vulkan's VkImage
. It is an object describing general texture properties.
The meaning of a subresource is close to D3D12's texture descriptor and Vulkan's VkImageView
. It is an object describing a single usage of a texture.
We create API-specific texture objects using our resource records. Similarly, we create API-specific descriptor/image view objects using our subresource records. The API-specific functions CreateTextures()
and CreateTextureViews()
implement these two processes, respectively. The function ModernFrameGraphBuilder::BuildResources()
encapsulates the two calls.
After this step, all API-specific resource data is initialized.
The implementations of the ModernFrameGraphBuilder::BuildPassObjects()
function create API-specific render pass objects. Both in D3D12 and Vulkan, the implementation iterates over render pass records. For each record, it creates a render pass object of corresponding type.
The implementations also initialize the command list buffer for the traversal.
After this step, all API-specific pass data is initialized.
The frame graph stores two sets of resource barriers per pass: the barriers before pass and the barriers after pass. The API-specific functions CreateBeforePassBarriers()
and CreateAfterPassBarriers
create the corresponding barriers for a given pass. Each API has its own rules for creating the barriers.
The function ModernFrameGraphBuilder::BuildBarriers()
iterates over all render pass records and calls these functions for each pass.
After this step, the frame graph is ready for traversal.
The classes D3D12::FrameGraphBuilder
and Vulkan::FrameGraphBuilder
implement API-specific frame graph builder needs: storing API-specific subresource data, and creating the objects for render passes, resources, subresources, and barriers.
Both API-specific builders have a Build()
function that takes a FrameGraphDescription
and an additional FrameGraphBuildInfo
parameter. This additional parameter contains the references to high-level manager objects such as memory manager or shader loader. The function saves these references and passes the work to ModernFrameGraphBuilder::Build()
.
Each render pass defines a set of standard parameters. These parameters are further extended for each API with more template magic. This API-specific template magic is defined in D3D12RenderPassDispatchFuncs.hpp and VulkanRenderPassDispatchFuncs.hpp. Each of two files contains:
— The function MakeUniquePass()
to create a unique_ptr
of the pass of given type;
— The function RegisterPassSubresources()
to initialize the payloads of the pass subresources;
— The function PropagateSubresourceInfos()
to horizontally propagate the subresource metadata within a pass.
To create render pass objects, we often need additional information about the pass subresources. In particular, a render pass might need actual resource objects or subresource formats. This data is managed by the frame graph builder.
To provide the necessary data to each pass object, each API-specific builder defines a set of functions to obtain the data for the given subresource in a given pass:
— GetRegisteredResource()
returns the texture object associated with the subresource;
— GetRegisteredSubresource()
(Vulkan) and GetRegisteredSubresource(SrvUav|Rtv|Dsv)()
(D3D12) return the view/descriptor associated with the subresource;
— Other GetRegisteredSubresource*()
functions return the subresource-specific data (format, state, etc.);
— GetPreviousPassSubresource*()
functions return the subresource-specific data for the previous pass of the subresource;
— GetNextPassSubresource*()
functions return the subresource-specific data for the next pass of the subresource.
Each API-specific builder defines its own set of functions like this. All functions are implemented in a similar way. They find the flat index of the subresource for the given combination of pass and per-pass subresource id, and take the necessary data from the subresource record or the subresource payload.
Both API-specific builders store additional data records for each subresource. We call these records subresource payloads. Each API-specific frame graph stores the list of subresource payloads:
//D3D12FrameGraphBuilder.hpp, VulkanFrameGraphBuilder.hpp
std::vector<SubresourceMetadataPayload> mSubresourceMetadataPayloads;
The ith element of the subresource payload list corresponds to the ith element of the subresource record list.
Both builders define InitMetadataPayloads()
function that initializes all subresource payloads within the frame graph builder. The initialization happens in three steps:
- Initializing the payloads for all passes except present pass. We iterate over all render passes and ask each one to initialize the provided subresource payload span. To do it, we use the per-pass
RegisterPassSubresources()
function. - Initializing the subresource payload for the backbuffer in the present pass. We obtain the necessary values from the swapchain.
- Initializing the payloads of helper subresources. We copy the necessary values from the primary subresources.
Both builders also define the functions for horizontal and vertical propagation of the subresource data. The vertical propagation is heavily API-specific; the horizontal propagation is implemented in per-pass PropagateSubresourceInfos()
functions.
The class D3D12::FrameGraphBuilder
is defined in D3D12FrameGraphBuilder.hpp. It implements the specific methods for building D3D12::FrameGraph
.
The class stores the reference to the frame graph being initialized:
FrameGraph* mD3d12GraphToBuild;
The function D3D12::FrameGraphBuilder::Build()
takes a D3D12::FrameGraphBuildInfo
parameter. We use this parameter to pass ID3D12Device*
for the D3D12-specific object initialization, ShaderManager*
for shader loading and root signature creation, and MemoryManager*
for allocating the memory for frame graph resources. We store these parameters in the frame graph builder and pass the work to ModernFrameGraphBuilder::Build()
.
Each subresource payload in the D3D12 builder stores the additional data describing the resource usage in the pass. This data is the resource format, the resource state, the index of the descriptor in the corresponding heap, and additional flags:
struct SubresourceMetadataPayload
{
DXGI_FORMAT Format;
D3D12_RESOURCE_STATES State;
UINT32 DescriptorHeapIndex;
UINT32 Flags;
};
The only currently possible flag is TextureFlagBarrierCommonPromoted
, which indicates that the resource does not require a transition barrier, and instead relies on common state promotion.
Each pass can access the current format and the previous, the current and the next states of any of its subresources, using the corresponding functions.
The horizontal subresource metadata propagation is specific for each pass. Each D3D12 render pass defines a PropagateSubresourceInfos()
function, which translates the metadata from pass subresources to other pass subresources.
The vertical propagation is implemented in the function D3D12::FrameGraphBuilder::PropagateSubresourcePayloadDataVertically()
. The function iterates over all subresources of a given resource, and translates the metadata from each subresource to the next subresource. The translation follows the set of rules:
- If the next subresource has unspecified format, we translate the format from the current subresource to the next subresource.
- If the current subresource has COMMON state, and the next subresource state allows common state promotion, we mark the next subresource as "promoted from common".
- If the current subresource is marked as "promoted from common", and we don't call
ExecuteCommandLists()
before using the next subresource, and two subresources have the same state, the resource should keep the common state promotion flag. We mark the next subresource as "promoted from common". - If the current subresource is marked as "promoted from common", and we call
ExecuteCommandLists()
between subresource usages, and the current resource state is read-only, the resource state implicitly decays to COMMON on the ECL call. After the decay, the state may be promoted again. If the next subresource state allows the promotion, we mark the next subresource as "promoted from common".
We check for ExecuteCommandLists()
call between subresource usages by checking the pass queue types of two subresource. If they are different, there's a definetely a ExecuteCommandLists()
between the usages, otherwise the frame graph is ill-formed.
The implementation of CreateTextures()
in the D3D12 builder creates texture objects in three major steps:
- Initializing
D3D12_RESOURCE_DESC1
for each resource to create; - Allocating a single heap for all resources;
- Creating each resource using
ID3D12Device8::CreatePlacedResource1()
.
To initialize the resouce descs, we iterate over all resource records in the frame graph. We skip the backbuffer resources, because swapchain manages their creation. For each resource, we iterate over all its subresources to fill a few gaps:
— The resource flags should include all possible states of the resource;
— The resource format should be TYPELESS
if at least two subresources have different formats;
— For resources used as RTV or DSV, we want to know the format for the pOptimizedClearValue
parameter in CreatePlacedResource()
.
For non-TYPELESS
resources, the format for pOptimizedClearValue
is the same as the format in D3D12_RESOURCE_DESC
. For TYPELESS
DSV resources, we can calculate the depth-stencil format from the typeless one. However, it's not the case for RTV resources. For example, if the resource format is R16_TYPELESS, we don't know if pOptimizedClearValue
should contain R16_UNORM
or R16_FLOAT
. To solve this problem, we keep track of RTV TYPELESS
resources, and for each such resource we store the format to use in pOptimizedClearValue
.
In addition, we mark each DSV resource with D3D12_RESOURCE_FLAG_DENY_SHADER_RESOURCE
, if we never use it as a shader resource.
After creating the list of resource descs, we allocate the memory for all resources, using D3D12::MemoryManager
.
In the final step of creating the resources, we iterate over all ResourceMetadata
records again. We check again if the resource is a backbuffer; if it is, we grab an image from the swapchain. Otherwise, we take D3D12_RESOURCE_DESC
we filled earlier and create an ID3D12Resource*
object. In both cases, we add the resource to the frame graph resource list.
For each resource we create, we set the InitialState
parameter of CreatePlacedResource()
to the state of the last subresource. It effectively puts each resource in "just before the next frame" state.
The implementation of CreateTextureViews()
in D3D12 frame graph builder creates SRV/UAV/RTV/DSV descriptors for frame graph subresources. Keep in mind that there's no 1:1 correspondence between subresources and descriptors. If two subresources of a single resource have the same format and the same state, they share the same descriptor. Some states of subresources, such as COPY_DEST, do not correspond to descriptors at all.
The frame graph stores the descriptor heaps of all three types: RTV, DSV and CBV/SRV/UAV. The last one is non-shader-visible and only used as backup storage. The descriptor heap manager copies the descriptors from this heap to the shader visible heap after the frame graph initialization. This is convenient, because we can destroy and recreate the descriptor heap without worrying about other descriptors used in the engine.
We create the descriptors in the corresponding heap based on the subresource state:
— If the state is NON_PIXEL_SHADER_RESOURCE
or PIXEL_SHADER_RESOURCE
, we create an SRV descriptor in the SRV/UAV/CBV heap;
— If the state is UNORDERED_ACCESS
, we create a UAV descriptor in the SRV/UAV/CBV heap;
— If the state is RENDER_TARGET
, we create an RTV descriptor in the RTV heap;
— If the state is DEPTH_READ
or DEPTH_WRITE
, we create a DSV descriptor in the DSV heap.
Helper subresources do not have descriptors.
First, we need to find how many descriptors we need to create. For each descriptor heap type, we create a list of subresources. We'll use the ith subresource in each list to create the ith descriptor in the corresponding heap.
We iterate over all resources in the frame graph builder, and for each resource we iterate over all its subresources. For each unique (subresource format, descriptor type) pair, we add the subresource to the corresponding per-heap list.
Now when we know the number of descriptors of each type, we create the descriptor heap objects. After that, we iterate over each per-heap list of subresources. For each subresource, we create the descriptor object of corresponding type in the corresponding heap.
The functions CreateBeforePassBarriers()
and CreateAfterPassBarriers()
create the resource barriers to execute before the given pass and after the given pass, respectively. In both cases, we have a set of rules indicating which barrier we need, and if we need a barrier at all. These rules depend on the on the previous/next resource state, previous/next pass queue, and on the common state promotion.
There are several principles these rules are based on:
— If the state of a resource doesn't change between two passes, no barrier is needed;
— If the state of a resource changes between two passes, and the rules for common state promotion/decay are followed, no barrier is needed;
— If the state of a resource changes between two passes, and the subresource does not fall into the category of common state promotion/decay, we need a resource transiton barrier;
— If the state of a resource changes between two passes, and two passes have different queue types, the barrier should belong to the pass that recognizes both states.
In case of ambiguity, where both after-pass barrier in the previous pass and before-pass barrier in the current pass are valid choices, we choose the first option.
For before-pass barriers, we compare each subresource in the pass with the subresource in the previous pass. The rules for creating a barrier are:
- If both passes have the same queue type, and both subresources have the same resource state, we don't need a barrier.
- If both passes have the same queue type, and the previous state was PRESENT, and the current subresource is marked as "promoted from common", the state is promoted from PRESENT. We don't need a barrier.
- If both passes have the same queue type, and the current state is PRESENT, we have already created the barrier for this pair of subresources when processing the previous pass.
- If both passes have the same queue type in other cases, we create a new transition barrier.
- If the command queue switches from Graphics to Compute between passes, and the current subresource is marked as "promoted from common", we don't need a barrier.
- If the queue switches from Graphics to Compute, the previous subresource is read-only and marked as "promoted from common", and the current subresource isn't marked as "promoted from common", the subresource state decays to COMMON on
ExecuteCommandLists()
, but is not promoted again on the new queue. We create a new transition barrier. - If the queue switches from Graphics to Compute, and both subresources have the same resource state, we don't need a barrier.
- If the queue switches from Graphics to Compute in other cases, we have already created the barrier for this pair of subresources when processing the previous pass.
- If the queue switches from Compute to Graphics, and the current subresource is marked as "promoted from common", we don't need a barrier.
- If the queue switches from Compute to Graphics, the previous subresource is read-only and marked as "promoted from common", and the current subresource isn't marked as "promoted from common", the subresource state decays to COMMON on the
ExecuteCommandLists()
, but is not promoted again on the new queue. We create a new transition barrier. - If the queue switches from Compute to Graphics, and both subresource states are recognized by the compute queue, we have already created the barrier for this pair of subresources when processing the previous pass.
- If the queue switches from Compute to Graphics, and the new state is not recognized by the compute queue, we create a new transition barrier.
- If the queue switches from Compute to Graphics, and both subresources have the same resource state, we don't need a barrier.
- If the queue switches from Compute or Graphics to Copy, the previous subresource is read-only and marked as "promoted from common", the state decays to COMMON on the
ExecuteCommandLists()
call. Because COMMON is the only valid state on Copy queue, we don't need a barrier. - If the queue switches from Compute or Graphics to Copy in other cases, we have already created the barrier for this pair of subresources when processing the previous pass.
- If the queue switches from Copy to Compute or Graphics, and the current subresource is marked as "promoted from common", we don't need a barrier.
- If the queue switches from Copy to Compute or Graphics, and the current subresource is not marked as "promoted from common", we create a new transition barrier.
For after-pass barriers, we compare each subresource in the pass with the subresource in the next pass. The rules for creating a barrier are:
- If both passes have the same queue type, and both subresources have the same resource state, we don't need a barrier.
- If both passes have the same queue type, the current subresource is read-only and marked as "promoted from common", and the next state is PRESENT, the state decays to PRESENT. We don't need a barrier.
- If both passes have the same queue type, and the next state is PRESENT in other cases, we create a new transition barrier to PRESENT.
- If both passes have the same queue type in other cases, we will create the barrier for this pair of subresources when processing the next pass.
- If the command queue switches from Graphics to Compute between passes, and the next subresource is marked as "promoted from common", we don't need a barrier.
- If the queue switches from Graphics to Compute, the current subresource is read-only and marked as "promoted from common", and the next subresource isn't marked as "promoted from common", the subresource state decays to COMMON on the
ExecuteCommandLists()
, but is not promoted again on the new queue. We will create the barrier for this pair of subresources when processing the next pass. - If the queue switches from Graphics to Compute, and both subresources have the same resource state, we don't need a barrier.
- If the queue switches from Graphics to Compute in other cases, we create a new transition barrier.
- If the queue switches from Compute to Graphics, and the next subresource is marked as "promoted from common", we don't need a barrier.
- If the queue switches from Compute to Graphics, the current subresource is read-only and marked as "promoted from common", and the next subresource isn't marked as "promoted from common", the subresource state decays to COMMON on the
ExecuteCommandLists()
, but is not promoted again on the new queue. We will create the barrier for this pair of subresources when processing the next pass. - If the queue switches from Compute to Graphics, the current subresource is read-only and marked as "promoted from common", and the next subresource state is PRESENT, the state decays to PRESENT. We don't need a barrier.
- If the queue switches from Compute to Graphics, and both subresource states are recognized by the compute queue, we create a new transition barrier.
- If the queue switches from Compute to Graphics, and the new state is not recognized by the compute queue, we need a barrier, but we can't execute it on this queue. We will create the barrier for this pair of subresources when processing the next pass.
- If the queue switches from Compute to Graphics, and both subresources have the same resource state, we don't need a barrier.
- If the queue switches from Compute or Graphics to Copy, the current subresource is read-only and marked as "promoted from common", the state decays to COMMON on the
ExecuteCommandLists()
call. Because COMMON is the only valid state on Copy queue, we don't need a barrier. - If the queue switches from Compute or Graphics to Copy in other cases, we create a new transition barrier.
- If the queue switches from Copy to Compute or Graphics, and the next subresource is marked as "promoted from common", we don't need a barrier.
- If the queue switches from Copy to Compute or Graphics, and the next subresource is not marked as "promoted from common", we need a barrier, but we can't execute it on this queue. We will create the barrier for this pair of subresources when processing the next pass.
The class Vulkan::FrameGraphBuilder
is defined in VulkanFrameGraphBuilder.hpp. It implements the methods to initialize Vulkan::FrameGraph
.
The builder stores a reference to the frame graph being initialized:
FrameGraph* mVulkanGraphToBuild;
The Build()
function in the Vulkan frame graph takes an additional Vulkan::FrameGraphBuildInfo
parameter. We use this parameter to pass InstanceParameters*
and DeviceParameters*
for extension- and GPU-specific limits, MemoryManager*
to initialize the resources, DeviceQueues*
to obtain the queue family indices, and WorkerCommandBuffers*
to set up the initial resource states. We save these references and pass the work to the ModernFrameGraphBuilder::Build()
.
Unlike the D3D12 builder, the Vulkan builder has an additional step of validating the descriptors, implemented in ValidateDescriptors()
function.
The frame graph in Vulkan stores all per-pass descriptors in a single large list. Each render pass refers to a range in this list. For several reasons, we can only create this list of descriptors after initializing all passes. The step of validating the desciptors iterates over all passes and initializes each per-pass range.
The whole process of Vulkan descriptor management will be described in another article.
Subresource payloads in Vulkan::FrameGraphBuilder
store several Vulkan-specific values for each particular subresource, related to its usage in its pass. These values are image format, aspect flags, layout, usage flags, pipeline stage flags, access flags, frame graph-specific image view index, and miscellaneous flags:
//Additional Vulkan-specific data for each subresource metadata node
struct SubresourceMetadataPayload
{
VkFormat Format;
VkImageAspectFlags Aspect;
VkImageLayout Layout;
VkImageUsageFlags Usage;
VkPipelineStageFlags Stage;
VkAccessFlags Access;
uint32_t ImageViewIndex;
uint32_t Flags;
};
Two possible miscellaneous flags are TextureFlagAutoBeforeBarrier
and TextureFlagAutoAfterBarrier
, indicating that we don't create a before-pass barrier or an after-pass barrier, respectively, for the subresource. Instead, the resource state transition is managed by VkRenderPass
.
The builder provides the methods to access these values for each particular subresource, as well as the values in the previous and the next passes, using the corresponding functions. We use these values to initialize VkSubpassDependency
s when creating render pass objects.
As with the D3D12 builder, the horizontal subresource metadata propagation is pass-specific and implemented in per-pass PropagateSubresourceInfos()
functions. These functions translate the metadata from subresources to other subresources within a pass.
The vertical propagation is implemented in Vulkan::FrameGraphBuilder::PropagateSubresourcePayloadDataVertically()
. For the given resource, it iterates over its subresources, and for each subresource it translates the metadata to the next one, based on the set of rules:
- If the next subresource has image
Aspect
flags uninitialized, we translate the flags from the current one; - If the next subresource has its
Format
uninitialized, we translate the format from the current one; - If two subresources have the same queue family index and the same resource layout, but they have different
Access
flags, we translate the access flags from the current subresource to the next one and the other way around.
The last rule ensures that access mask for each subresource covers all possible states between two barriers.
The implementation of CreateTextures()
in the Vulkan builder creates all VkImage
objects in the frame graph. It happens in four steps:
- Creating a
VkImage
object for each non-backbuffer image; - Initializing a
VkImageMemoryBarrier
for each image to set the initial state; - Allocating the memory for all non-backbuffer images;
- Executing the resouce state initialization barriers.
To create VkImage
objects, we need to fill VkImageCreateInfo
entries for each image. There are two things to consider. First, the usage flags should cover the usages of all subresources. Second, if at least two subresources have different formats, we need to set the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT
flag. We iterate over all subresources of each resource to initialize these values.
After initializing VkImageCreateInfo
, we create a VkImage
object and add it to the frame graph resource list. For each non-backbuffer image, we also fill a VkBindImageMemoryInfo
structure, and initialize an entry in FrameGraph::mOwnedImageSpans
. This entry in owned image spans marks the image as "managed by frame graph" instead of "managed by swapchain".
We add the backbuffer images directly to the frame graph resource list, without any extra steps.
After that, we initialize the initial state barrier for each image, including backbuffer ones. The barrier sets the resource layout, access mask, and queue family index. We take these values from the last subresource of each resource, putting each image in "right before the next frame" state.
Next, we initialize the image memory using Vulkan::MemoryManager
. Finally, we record the image barrier calls, and wait for the graphics queue to execute them.
The implementation of CreateTextureViews()
in Vulkan::FrameGraphBuilder
creates all VkImageView
objects corresponding to frame graph subresources. We only create an image view if it's necessary: some subresource usages don't need an image view object, and two subresources of the same resource can share a VkImageView
if they have the same format, the same access mask, and the same subresource slice.
We never create image views for helper subresources.
There is only a handful of cases when we need an image view in Vulkan functions. We check for these cases by checking the subresource usage flags. These cases are:
— Accessing the image in shaders (USAGE_SAMPLED_BIT
, USAGE_STORAGE_BIT
, USAGE_INPUT_ATTACHMENT_BIT
, TRANSIENT_ATTACHMENT_BIT
);
— Rendering to the image (COLOR_ATTACHMENT_BIT
, DEPTH_STENCIL_ATTACHMENT_BIT
, TRANSIENT_ATTACHMENT_BIT
);
— Using the image as a fragment density map (FRAGMENT_DENSITY_MAP_BIT_EXT
);
— Using the image as a shading rate mask (SHADING_RATE_ATTACHMENT_BIT_KHR
).
For each resource in the frame graph builder, we iterate over its subresources and check if any of the bits above is set in the usage flags. If it's the case, we check if we already have an image view for this (resource, format, aspect mask) triplet. If we do, we assign the index of the image view to the image view index of the subresource payload. If we don't, we create a new one, using the CreateImageView()
helper function, and assign the index of the new one.
The functions CreateBeforePassBarriers()
and CreateAfterPassBarriers()
create the before-pass and after-pass image memory barriers for the given frame graph pass. Both functions create the barriers based on the set of rules. These rules depend on the image layout change and queue family change.
The general principles for these rules are:
— If neither queue family nor image layout of the subresource changes between two passes, no barrier is needed;
— If queue family stays the same, but image layout changes, we need a layout change barrier;
— If queue family changes, we need a pair of acquire + release barriers.
In case of ambiguity, where both after-pass barrier in the previous pass and before-pass barrier in the current pass are valid choices, we choose the second option.
The aspect mask for each barrier covers the aspect masks of both subresources.
For before-pass barriers, we compare each subresource in the pass with the subresource in the previous pass. We don't need a barrier if the current subresource is marked with TextureFlagAutoBeforeBarrier
, or the previous subresource is marked with TextureFlagAutoAfterBarrier
. Otherwise, the rules for creating a barrier are:
- If both subresources have the same image layout and the same queue family index, we don't need a barrier.
- If both subresources have the same queue family index, and the current subresource has PRESENT layout, we have already created the barrier for this pair of subresources when processing the previous pass.
- If both subresources have the same queue family index, but different image layouts in other cases, we create a new layout change barrier.
- If two subresources have different queue family indices and the same image layout, we create a new image acquire barrier.
- If two subresources have different queue family indices and different image layouts, we create a new image acquire barrier with the layout change.
- If two subresources have different queue family indices and the previous layout was PRESENT, we're transitioning the resource from the dedicated present queue. We create a new image acquire barrier.
- If two subresources have different queue family indices and the current layout is PRESENT, we're transitioning the resource to the dedicated present queue. We create a new image acquire barrier.
For after-pass barriers, we compare each subresource in the pass with the subresource in the next pass. We don't need a barrier if the current subresource is marked with TextureFlagAutoAfterBarrier
, or the next subresource is marked with TextureFlagAutoBeforeBarrier
. Otherwise, the rules for creating a barrier are:
- If both subresources have the same image layout and the same queue family index, we don't need a barrier.
- If both subresources have the same queue family index, and the next subresource has PRESENT layout, we create a new layout change barrier.
- If both subresources have the same queue family index, but different image layouts in other cases, we will create the barrier for this pair of subresources when processing the next pass.
- If two subresources have different queue family indices and the same image layout, we create a new image release barrier.
- If two subresources have different queue family indices and different image layouts, we create a new image release barrier with the layout change.
- If two subresources have different queue family indices and the current layout is PRESENT, we're transitioning the resource from the dedicated present queue. We create a new image release barrier.
- If two subresources have different queue family indices and the next layout is PRESENT, we're transitioning the resource to the dedicated present queue. We create a new image release barrier.
After covering the internal details, we'll talk about how to use them.
We recreate the frame graph each time the window resolution changes. The window resize callback calls the Engine::CreateFrameGraph()
function. In this function we initialize the frame graph config with the new resolution, fill the frame graph description, and call Renderer::InitFrameGraph()
.
Both D3D12 and Vulkan renderers in InitFrameGraph()
recreate their internal FrameGraph
object, initialize the frame graph builder, and call Build().
After building the frame graph, the renderers reinitialize their descriptor heaps/descriptor pools. This process will be covered in separate articles.