basic prototype Linear algebra compiler #330

Jobhdez · 2023-10-14T02:10:40Z

Hello you all,

In response to some comments made by @mmirman I started working on a basic linear algebra compiler prototype.

Please comment if this is in the right direction.

I started the implementation of a simple prototype but I think it captures the structure of a linear algebra compiler.

Instead of the grammar that I defined we can use Einstein notation.

We need to talk more about the grammar, what operations are we supporting?

The structure of this compiler is as follows. The initial AST is converted to an intermediate language lalg which makes loops explicit.

Making loops explicit will help as we expand the operations because we can apply loop optimizations such as loop unrolling, and loop fusion.

I did not implement a sparse intermediate language but I can work on this next as I am expanding the operations.

Finally the intermediate representation lalg gets lowered to C code.

The reason why I decided to do this is as follows.

We can either lower the c code, with loop optimizations, to vectorized code ie SIMD; moreover, we could lower linear algebra operations to CUDA as @mmirman had suggested.

If we dont decide to go this way, we can lower the linear algebra to assembly and we can use a low level intermediate language that looks like C.

Please let me know your comments and how I can improve it.

The final version would look something like this.

Using PyTorch’s torch.jit.trace we can get access to the the computational graph of a given open source language model.

Once we have the computational graph we convert this into a graph based intermediate language.

We then apply machine independent optimizations to this high level IR and lower it to a low level IR to which we apply machine dependent optimizations.

I just read a paper and the authors generated C code that was on par with the tvm llvm backend.

So we can get good performance if we apply the right optimizations.

We can separate the linear algebra computations from the schedules.

Schedules include things like tiling, vectorization, loop unrolling.

Thanks

Here is an example of how it can be used so far. I just added matrix plus matrix, and vector plus vector for now but I am going to add more operations and will add the corresponding c code.

>>> from ast_to_lalg import ast_to_lalg
>>> from parser import parser 
>>> from lalg_to_c import lalg_to_c
>>> ast = parser.parse("[[3 4 5][4 5 6]] + [[4 5 6] [5 6 7]]")

>>> ast2 = ast_to_lalg(ast)
>>> ast2
(LalgForLoop2D (exps: (LalgExps [(LalgMatrix [[3, 4, 5], [4, 5, 6]]), (LalgMatrix [[4, 5, 6], [5, 6, 7]])])) (n: (LalgInt 2)) (inner_n: (LalgInt 3)) (i: (LalgInt 0)) (j: (LalgInt 0)) (op:(LalgOp +)))

>>> lalg_to_c(ast2)
'int matrix1[2,3] = {{3, 4, 5}, {4, 5, 6}};\n    int matrix2[2, 3] = {{4, 5, 6}, {5, 6, 7}};\n    \n    matrix *mat = initialize_matrix(matrix1, 2, 3);\n    \n    matrix *mat2 = initialize_matrix(matrix2, 2, 3);\n\n    add_matrices(mat, mat2);'


>>> ast5 = parser.parse("[3 4 5] + [4 5 6]")
>>> ast6 = ast_to_lalg(ast5)
>>> lalg_to_c(ast6)
'int vec[] = {3, 4, 5};\n    int vec2[] = {3, 4, 5};\n\n    vector *v = initialize_vector(vec, 3);\n\n    vector *v2 = initialize_vector(vec2, 3);\n\n    add_vectors(v, v2);'

Jobhdez · 2023-10-15T14:59:29Z

src/compiler/get_bloom_graph.py

+module = torch.jit.trace(model, t1)
+graph = module.graph.copy()


this is how we can extract the computational graph of language models. here we extract the computational graph of the bloom-model560m language model.

From the computational graph we can then turn it into a intermediate language and start optimizing it and lowering it cuda :)

Would it be possible to compile the graph directly to cuda without intermediate language?

alternatively, could we compile to a hardware-agnostic framework like openCL?

Would it be possible to compile the graph directly to cuda without intermediate language?

@VictorOdede we can lower from the initial ast but the papers I’ve read use an intermediate language for machine independent optimizations.

alternatively, could we compile to a hardware-agnostic framework like openCL?

I think we can compile to Opencl. Python has bindings for this. So it would just be a question of turning the graph into an ast and identifying the linear algebra operations in the given language model and lowering this through an intermediate Language to either opencl or cuda. Would you prefer opencl? Please share your thoughts. Thanks

My opinion is that we should try compiling to cuda first because we want to support the devices that most people are using atm. After we have our cuda optimizations then maybe we can also do ROCm since that's the 2nd most used platform. After that then we can consider more hardware agnostic optimizations.

@VictorOdede im thinking of lowering the graph to CUDA C code

Makes sense. What does the code do so far?

How’s it going @VictorOdede - so far it lowers a simple linear algebra language - vector addition and matrix addition to C.

It contains a parser, an intermediate language and a c code generator.

Jobhdez added 3 commits October 13, 2023 18:59

add basic linear algebra compiler prototype

3b55a03

ignore emacs files

82db1a4

illustrate how we are going to extract graph

247cecd

Jobhdez commented Oct 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basic prototype Linear algebra compiler #330

basic prototype Linear algebra compiler #330

Jobhdez commented Oct 14, 2023 •

edited

Loading

Jobhdez Oct 15, 2023

VictorOdede Oct 16, 2023

VictorOdede Oct 16, 2023

Jobhdez Oct 16, 2023

Jobhdez Oct 16, 2023

VictorOdede Oct 26, 2023

Jobhdez Oct 27, 2023 •

edited

Loading

VictorOdede Oct 31, 2023

Jobhdez Oct 31, 2023 •

edited

Loading

		module = torch.jit.trace(model, t1)
		graph = module.graph.copy()

basic prototype Linear algebra compiler #330

Are you sure you want to change the base?

basic prototype Linear algebra compiler #330

Conversation

Jobhdez commented Oct 14, 2023 • edited Loading

Jobhdez Oct 15, 2023

Choose a reason for hiding this comment

VictorOdede Oct 16, 2023

Choose a reason for hiding this comment

VictorOdede Oct 16, 2023

Choose a reason for hiding this comment

Jobhdez Oct 16, 2023

Choose a reason for hiding this comment

Jobhdez Oct 16, 2023

Choose a reason for hiding this comment

VictorOdede Oct 26, 2023

Choose a reason for hiding this comment

Jobhdez Oct 27, 2023 • edited Loading

Choose a reason for hiding this comment

VictorOdede Oct 31, 2023

Choose a reason for hiding this comment

Jobhdez Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Jobhdez commented Oct 14, 2023 •

edited

Loading

Jobhdez Oct 27, 2023 •

edited

Loading

Jobhdez Oct 31, 2023 •

edited

Loading