[ET-VK][LlaMa] Split SDPA + KV cache operator into SDPA operator and KV cache update operator #8064
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Context
#7413 and #7412 split the
sdpa_with_kv_cache
operator into two separate operators,update_cache
andcustom_sdpa
to decouple the cache update step from the actual SDPA computation.As a result, SDPA is no longer being delegated on Vulkan because of this interface change. To rectify this, Vulkan must also split
sdpa_with_kv_cache
into two operators.Note that during this diff the new operators are not partitioned yet because of complications caused by assertion ops in the graph. The next diff adds a pass to remove such assertion ops which allows the new operators to be partitioned.
Differential Revision: D68916952