Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic prototype Linear algebra compiler #330

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Jobhdez
Copy link
Collaborator

@Jobhdez Jobhdez commented Oct 14, 2023

Hello you all,

In response to some comments made by @mmirman I started working on a basic linear algebra compiler prototype.

Please comment if this is in the right direction.

I started the implementation of a simple prototype but I think it captures the structure of a linear algebra compiler.

Instead of the grammar that I defined we can use Einstein notation.

We need to talk more about the grammar, what operations are we supporting?

The structure of this compiler is as follows. The initial AST is converted to an intermediate language lalg which makes loops explicit.

Making loops explicit will help as we expand the operations because we can apply loop optimizations such as loop unrolling, and loop fusion.

I did not implement a sparse intermediate language but I can work on this next as I am expanding the operations.

Finally the intermediate representation lalg gets lowered to C code.

The reason why I decided to do this is as follows.

We can either lower the c code, with loop optimizations, to vectorized code ie SIMD; moreover, we could lower linear algebra operations to CUDA as @mmirman had suggested.

If we dont decide to go this way, we can lower the linear algebra to assembly and we can use a low level intermediate language that looks like C.

Please let me know your comments and how I can improve it.

The final version would look something like this.

Using PyTorch’s torch.jit.trace we can get access to the the computational graph of a given open source language model.

Once we have the computational graph we convert this into a graph based intermediate language.

We then apply machine independent optimizations to this high level IR and lower it to a low level IR to which we apply machine dependent optimizations.

I just read a paper and the authors generated C code that was on par with the tvm llvm backend.

So we can get good performance if we apply the right optimizations.

We can separate the linear algebra computations from the schedules.

Schedules include things like tiling, vectorization, loop unrolling.

Thanks

Here is an example of how it can be used so far. I just added matrix plus matrix, and vector plus vector for now but I am going to add more operations and will add the corresponding c code.

>>> from ast_to_lalg import ast_to_lalg
>>> from parser import parser 
>>> from lalg_to_c import lalg_to_c
>>> ast = parser.parse("[[3 4 5][4 5 6]] + [[4 5 6] [5 6 7]]")

>>> ast2 = ast_to_lalg(ast)
>>> ast2
(LalgForLoop2D (exps: (LalgExps [(LalgMatrix [[3, 4, 5], [4, 5, 6]]), (LalgMatrix [[4, 5, 6], [5, 6, 7]])])) (n: (LalgInt 2)) (inner_n: (LalgInt 3)) (i: (LalgInt 0)) (j: (LalgInt 0)) (op:(LalgOp +)))

>>> lalg_to_c(ast2)
'int matrix1[2,3] = {{3, 4, 5}, {4, 5, 6}};\n    int matrix2[2, 3] = {{4, 5, 6}, {5, 6, 7}};\n    \n    matrix *mat = initialize_matrix(matrix1, 2, 3);\n    \n    matrix *mat2 = initialize_matrix(matrix2, 2, 3);\n\n    add_matrices(mat, mat2);'


>>> ast5 = parser.parse("[3 4 5] + [4 5 6]")
>>> ast6 = ast_to_lalg(ast5)
>>> lalg_to_c(ast6)
'int vec[] = {3, 4, 5};\n    int vec2[] = {3, 4, 5};\n\n    vector *v = initialize_vector(vec, 3);\n\n    vector *v2 = initialize_vector(vec2, 3);\n\n    add_vectors(v, v2);'

Comment on lines +16 to +17
module = torch.jit.trace(model, t1)
graph = module.graph.copy()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how we can extract the computational graph of language models. here we extract the computational graph of the bloom-model560m language model.

From the computational graph we can then turn it into a intermediate language and start optimizing it and lowering it cuda :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to compile the graph directly to cuda without intermediate language?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, could we compile to a hardware-agnostic framework like openCL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to compile the graph directly to cuda without intermediate language?

@VictorOdede we can lower from the initial ast but the papers I’ve read use an intermediate language for machine independent optimizations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, could we compile to a hardware-agnostic framework like openCL?

I think we can compile to Opencl. Python has bindings for this. So it would just be a question of turning the graph into an ast and identifying the linear algebra operations in the given language model and lowering this through an intermediate Language to either opencl or cuda. Would you prefer opencl? Please share your thoughts. Thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that we should try compiling to cuda first because we want to support the devices that most people are using atm. After we have our cuda optimizations then maybe we can also do ROCm since that's the 2nd most used platform. After that then we can consider more hardware agnostic optimizations.

Copy link
Collaborator Author

@Jobhdez Jobhdez Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VictorOdede im thinking of lowering the graph to CUDA C code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. What does the code do so far?

Copy link
Collaborator Author

@Jobhdez Jobhdez Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How’s it going @VictorOdede - so far it lowers a simple linear algebra language - vector addition and matrix addition to C.

It contains a parser, an intermediate language and a c code generator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants