-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing Timeline Semaphores API to Kompute #238
Comments
Note that the page I linked mentions about the implementation of the timeline semaphore API as a Vulkan 1.1 layer as part of the Vulkan-ExtensionLayer project. So it should be possible to make it work in Vulkan 1.1. |
Thanks @ChenKuo - also following up from the question you asked in #52
Ok that sounds good, in that casa does the OpMemoryBarrier solve your current usecase? If that is the case, do you have a relevant example where you would need to use the semaphores? We would need to have a concrete example, as it seems that implementing the semaphore functionality for interdependency would need to consider DAG-like dependencies between operations, whcih may require more thought to ensure that it indeed works as expected, as opposed to implemented as a workaround that just exposes the functionality. |
@axsaucedo Let's say I have a simple rendering pipeline implementation, so for N geometries, we need to do rasterization -> color-blending N times. If I can synchronize 2 queues, I can let queue1 focus on rasterization algorithm (only need to update vertex positions index each pass), and queue2 focus on blending algorithm (need to use the result in queue1 as it becomes available), but they work in parallel. If one queue falls behind, I can even use a third queue to balance the workload, which would require even more advanced synchronization. Timeline Semaphores make this very easy because I just need to match the values of rasterization_timeline to blending_timeline. |
I am not sure if this code is syntactically correct, but this is the general idea for my scenario above. # in the rasterization thread ...
for rasterization_pass_number in range(N):
sq1.
.record(
kp.OpAlgoDispatch(rasterization_algo,
[rasterization_pass_number])) # in the shader this index points to the vertices positions
.eval_async( # do not go over 10 passes ahead of blending_algo, we have limited memory to store the result
rasterization_timeline(signal=rasterization_pass_number+1), # no wait because rasterization is independent
blending_timeline(wait=rasterization_pass_number-10)) # in the blending thread ...
for blending_pass_number in range(N):
sq2.
.record(
kp.OpAlgoDispatch(blending_algo,
[blending_pass_number])) # in the shader we can use this index to find the rasterization result
.eval_async( # rasterization_algo need to be at least 1 pass ahead of blending_algo
rasterization_timeline(wait=blending_pass_number+1),
blending_timeline(wait=blending_pass_number, signal=blending_pass_number+1)) |
As suggested by @ChenKuo in #52 this would encompass adding support for the Vulkan Timeline Semaphores introduced in 1.2 https://www.khronos.org/blog/vulkan-timeline-semaphores. This would mean that we would have to either drop support for pre-1.2 Vulkan, or make sure this feature is behind a feature flag / compile-time macro, and it is tested for Vulkan 1.1.x and 1.2.x.
Currently what we'll need to first explore is an interface that can provide a high level interface than the one provided below, as well as understanding what are the corner case behaviours that could arise from objects being removed, or failed, etc.
The interface provided below is the current proposed structure for the abstraction of the timeline API, however this interface may not be possible given that the sequence can only hold a single set of operations per sequnce, and whenever a new one is recorded, it would either apend or clear the previous ones.
The original proposal is below
Right now the only synchronization options (that I can see) are running
eval()
synchronously or useeval_await()
asynchronously. Both cause the thread to stop, which translate to a loss of time when it can be sending the next batch to queue. Vulkan 1.2 has the Timeline Semaphores API which seems to be a good solution if we can integrate it to Kompute API.For example, suppose I have algorithm A using tensors a, algorithm B using tensors b, and algorithm C using tensors a, b, c. A and B are independent, but C is dependent on the result of A and B. We only need the result from C, not intermediate results from A and B. This is how I wish the code would look in Python:
(I am not sure if my understanding of Timeline Semaphore is correct. It is kind of confusing.)
There is a (partial) workaround by creating multiple threads and
Sequence
objects, so one thread-Sequence can move data around while the other is waiting. However, this still does not solve the dependency issue, I think. I am not an expert in Vulkan or C++, so what I wrote may be wrong. Maybe there is a better way I do not know of. If you know please let me know.Thanks.
The text was updated successfully, but these errors were encountered: