-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic prototype Linear algebra compiler #330
base: main
Are you sure you want to change the base?
Conversation
module = torch.jit.trace(model, t1) | ||
graph = module.graph.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how we can extract the computational graph of language models. here we extract the computational graph of the bloom-model560m
language model.
From the computational graph we can then turn it into a intermediate language and start optimizing it and lowering it cuda :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to compile the graph directly to cuda without intermediate language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, could we compile to a hardware-agnostic framework like openCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to compile the graph directly to cuda without intermediate language?
@VictorOdede we can lower from the initial ast but the papers I’ve read use an intermediate language for machine independent optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, could we compile to a hardware-agnostic framework like openCL?
I think we can compile to Opencl. Python has bindings for this. So it would just be a question of turning the graph into an ast and identifying the linear algebra operations in the given language model and lowering this through an intermediate Language to either opencl or cuda. Would you prefer opencl? Please share your thoughts. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is that we should try compiling to cuda first because we want to support the devices that most people are using atm. After we have our cuda optimizations then maybe we can also do ROCm since that's the 2nd most used platform. After that then we can consider more hardware agnostic optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VictorOdede im thinking of lowering the graph to CUDA C code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. What does the code do so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How’s it going @VictorOdede - so far it lowers a simple linear algebra language - vector addition and matrix addition to C.
It contains a parser, an intermediate language and a c code generator.
Hello you all,
In response to some comments made by @mmirman I started working on a basic linear algebra compiler prototype.
Please comment if this is in the right direction.
I started the implementation of a simple prototype but I think it captures the structure of a linear algebra compiler.
Instead of the grammar that I defined we can use Einstein notation.
We need to talk more about the grammar, what operations are we supporting?
The structure of this compiler is as follows. The initial AST is converted to an intermediate language
lalg
which makes loops explicit.Making loops explicit will help as we expand the operations because we can apply loop optimizations such as loop unrolling, and loop fusion.
I did not implement a sparse intermediate language but I can work on this next as I am expanding the operations.
Finally the intermediate representation
lalg
gets lowered to C code.The reason why I decided to do this is as follows.
We can either lower the c code, with loop optimizations, to vectorized code ie SIMD; moreover, we could lower linear algebra operations to CUDA as @mmirman had suggested.
If we dont decide to go this way, we can lower the linear algebra to assembly and we can use a low level intermediate language that looks like C.
Please let me know your comments and how I can improve it.
The final version would look something like this.
Using PyTorch’s
torch.jit.trace
we can get access to the the computational graph of a given open source language model.Once we have the computational graph we convert this into a graph based intermediate language.
We then apply machine independent optimizations to this high level IR and lower it to a low level IR to which we apply machine dependent optimizations.
I just read a paper and the authors generated C code that was on par with the tvm llvm backend.
So we can get good performance if we apply the right optimizations.
We can separate the linear algebra computations from the schedules.
Schedules include things like tiling, vectorization, loop unrolling.
Thanks
Here is an example of how it can be used so far. I just added matrix plus matrix, and vector plus vector for now but I am going to add more operations and will add the corresponding c code.