Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Runtime Kernel Fusion #3

Open
ctcyang opened this issue May 2, 2019 · 2 comments
Open

Feature: Runtime Kernel Fusion #3

ctcyang opened this issue May 2, 2019 · 2 comments
Assignees

Comments

@ctcyang
Copy link
Collaborator

ctcyang commented May 2, 2019

If you have a DAG of binary operations, you can traverse it in some topological order and generate proper bitcodes for your GPU kernel. OmniSci does it on parsed SQL queries, and more specifically different filter combinations, etc. TensorFlow does it based on the equation that needs to be minimized for gradient descent. Can we do it for GraphBLAS?

If you want to keep your kernel code as simple as possible, with minimal branches, etc., then there's no way doing it at compile-time. Unless you know what you're going to solve at compile time for SQL queries and tensor flow optimizations, you don't know about the exact details of the queries/equations at compile-time.

Both TensorFlow and OmniSci do real-time code generation using LLVM.

@aydinbuluc
Copy link
Collaborator

aydinbuluc commented May 2, 2019 via email

@ctcyang
Copy link
Collaborator Author

ctcyang commented May 2, 2019

Two things are unknown at compile-time:

  1. If the user wants to run a few iterations of a for-loop, the amount of sparsity in the data (which affects which kernel is more optimal and should be chosen) after Iteration X is unknown at compile time. So the kernel fusion could vary each iteration of the loop in a data-dependent way
  2. I'm inclined towards a shared library approach rather than header-only, in order to support a Python frontend that can be used with Jupyter notebook. The consequence is that the computation graph which depends on which ops the user wants is unknown at compile time (i.e. when the shared library is compiled).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants