Skip to content

Commit

Permalink
Update XeTile.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Jianhui-Li authored Dec 14, 2023
1 parent 594d134 commit 4648e71
Showing 1 changed file with 10 additions and 6 deletions.
16 changes: 10 additions & 6 deletions docs/rfcs/XeTile.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,15 +105,19 @@ Below is an example.
```
Within the `xetile.wg_map`, `sg_layout` specifies the subgroup layout, and `sg_data` specifies the tile size owned by each subgroup. In the example above, sg_layout [2,2] means that each workgroup has 4 subgroups with 2 rows and 2 columns. sg_data [32,128] means that each subgroup works on a submatrix [32, 128]. The data elements assigned to each subgroup thread must be contiguous.

For each dimension, the tile size must be divisible by sg_data, and the size of sg_layout multiplying sg_data must be divisible by the tile size. The tile is distributed to sg_data in a round-robin fashion. When the tile size is smaller than the size of sg_layout multiplying sg_data, the tile data is wrapped around and continues to be distributed to subgroup threads. For example, for the tile size [128, 128] and sg_data [32, 128], along the second dimension, there is no more data left to assign after the first subgroup, it wraps around and moves to the beginning of the tile and continues the assignment. Instead, for the first dimension, there is more data left after the first round of distribution, so it move to the next subtile and continue the assignement. As a result, the tile would be sliced to four subtiles with size [32,128], with the following mapping:
[ 0:31, 0:127] : [0, 0] [0, 1]
[ 32:63, 0:127] : [1, 0] [1, 1]
[ 64:95, 0:127] : [0, 0] [0, 1]
[96:127, 0:127] : [1, 0] [1, 1]
For each dimension, the tile size must be divisible by `sg_data`, and the size of `sg_layout` multiplying `sg_data` must be divisible by the tile size. The tile is distributed to sg_data in a round-robin fashion. When the tile size is smaller than the size of sg_layout multiplying sg_data, the tile data is wrapped around and continues to be distributed to subgroup threads. For example, for the tile size [128, 128] and sg_data [32, 128], along the second dimension, there is no more data left to assign after the first subgroup, it wraps around and moves to the beginning of the tile and continues the assignment. Instead, for the first dimension, there is more data left after the first round of distribution, so it move to the next subtile and continue the assignement. As a result, the tile would be sliced to four subtiles with size [32,128], with the following mapping:

[ 0:31, 0:127] : [0, 0] [0, 1]

[ 32:63, 0:127] : [1, 0] [1, 1]

[ 64:95, 0:127] : [0, 0] [0, 1]

[96:127, 0:127] : [1, 0] [1, 1]
Within the `xetile.sg_map`, `wi_layout` specifies the layout in which WI threads correspond to the memory, and `wi_data` describes the data block accessed by each WI thread. In the example above, wi_layout=[2, 8] means that each subgroup has 16 WI threads in 2 rows and 8 columns, and wi_data=[1,2] means that each WI thread owns a [1,2] data fragment. The data elements with each data fragment assigned to a WI thread must be contiguous. So the sg_map describes a total [2,16] submatrix at the subgroup level.

The size of `sg_data` within `xetile.wg_map` must be divisible by sg_map size, which comes from wi_layout multiplying with wi_data. More specifically, for each dimension, the sg_data size must be divisible by wi_layout x wi_data. The wg_map.sg_data size must be larger than or equal to the sg_map size. When the 2D subtensor size is larger than the sg_map size, it is distributed to WI threads in a round-robin fashion.
The size of `sg_data` within `xetile.wg_map` must be divisible by sg_map size, which equals to `wi_layout` multiplying with `wi_data` within `xetile.sg_map`. More specifically, for each dimension, the `sg_data` size must be divisible by `wi_layout` x `wi_data`. The sg_data size must be larger than or equal to the sg_map size. When the 2D subtensor size is larger than the sg_map size, it is distributed to WI threads in a round-robin fashion.


## Alternative design considerations
Expand Down

0 comments on commit 4648e71

Please sign in to comment.