Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vulkan] Reduce the number of descriptor sets used by compiling one entry-point per SPIR-V module #8452

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

derek-gerstmann
Copy link
Contributor

@derek-gerstmann derek-gerstmann commented Oct 31, 2024

In order to avoid using up all descriptor sets for complicated pipelines, this PR changes the Vulkan CodeGen such that we encode each Kernel entry-point into it's own SPIR-V module and bind them as separate shaders to avoid running out of resources on constrained devices.

Fixes #8297 , #8296 , #8295 , and #8294. Re-enable performance wrap test #7559 .

@derek-gerstmann
Copy link
Contributor Author

@steven-johnson Okay, testing this PR on the configs I have locally has everything passing. Can we re-enable testing on the Buildbots to see if there's any device or driver differences I need to catch?

@steven-johnson
Copy link
Contributor

@steven-johnson Okay, testing this PR on the configs I have locally has everything passing. Can we re-enable testing on the Buildbots to see if there's any device or driver differences I need to catch?

halide/build_bot#294

@steven-johnson
Copy link
Contributor

@steven-johnson Okay, testing this PR on the configs I have locally has everything passing. Can we re-enable testing on the Buildbots to see if there's any device or driver differences I need to catch?

halide/build_bot#294

Done, please start testing

@derek-gerstmann
Copy link
Contributor Author

I’ll investigate the three failures:

The following tests FAILED:
correctness_bounds 
correctness_debug_to_file
correctness_device_buffer_copies_with_profile

@derek-gerstmann
Copy link
Contributor Author

Fixes added for #8466

@derek-gerstmann derek-gerstmann added bug gpu documentation Missing, incorrect, or unclear. Spelling & grammar mistakes. release_notes For changes that may warrant a note in README for official releases. labels Nov 6, 2024
@derek-gerstmann
Copy link
Contributor Author

derek-gerstmann commented Nov 7, 2024

@steven-johnson I believe the GPU lifetime and device memory leak errors are actually caused by the Validation Layer shared lib itself when invoked with ctest --parallel. It doesn't seem very robust at all. It's useful for diagnosing issues when they come up, but it doesn't seem production worthy. I'd like to suggest we remove the VK_INSTANCE_LAYERS env var from the buildbot config, and keep the Vulkan tests enabled.

@steven-johnson
Copy link
Contributor

@steven-johnson I believe the GPU lifetime and device memory leak errors are actually caused by the Validation Layer shared lib itself when invoked with ctest --parallel. It doesn't seem very robust at all. It's useful for diagnosing issues when they come up, but it doesn't seem production worthy. I'd like to suggest we remove the VK_INSTANCE_LAYERS env var from the buildbot config, and keep the Vulkan tests enabled.

Done, change made and buildbot master restarted -- please try again :-)

@derek-gerstmann
Copy link
Contributor Author

So, other than the build failures from LLVM20 interface changes, all the Vulkan tests and apps are actually passing now and reporting "SUCCESS". However, there's exception's being thrown at shutdown that I can't reproduce.

@derek-gerstmann
Copy link
Contributor Author

Ah, wait. I was able to get a build that seems to reproduce the shutdown issues. Investigating now.

Derek Gerstmann added 2 commits November 8, 2024 11:29
…(from 32MB!).

Ensure allocator.collect() is called when context is destroyed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug documentation Missing, incorrect, or unclear. Spelling & grammar mistakes. gpu release_notes For changes that may warrant a note in README for official releases.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[vulkan] stencil_chain_filter generates too many descriptor sets
2 participants