Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About rid and cid in the kernel #12

Open
KeePLeArNinggG opened this issue May 24, 2022 · 2 comments
Open

About rid and cid in the kernel #12

KeePLeArNinggG opened this issue May 24, 2022 · 2 comments

Comments

@KeePLeArNinggG
Copy link

I've been thinking about it(rid = blockDim.yblockIdx.x+threadIdx.y; cid = (blockIdx.y<<5)+threadIdx.x;)for a long time,But I still can't understand why this is.
As I have learned,usually,rid = blockDim.x
blockIdx.x+threadIdx.x; cid = blockIdx.y*blockDim.y+threadIdx.y;
I hope you can explain why this is the case.
Thank you very much.

@hgyhungry
Copy link
Owner

Hi
The following figure explains how this kernel arranges thread Idx dimension and block Idx dimension.
image

If (blockIdx.y<<5) is what confuses you, this is to count the horizontal offset of a threadblock. This is because we always make blockDim.x=32, and in this code you are looking at, coarsening factor = 1.

If the mixture of x and y in the same offset confuses you, it is because within a threadblock I bind horizontal axis to dim-x and vertical axis to dim-y, but among threadblocks (i.e. in a grid) I bind horizontal axis to dim-x and vertical to dim-y. The intra-threadblock mapping cannot be changed because we want a coalesced access to the dense matrix row. The inter-threablock mapping, however, may be changes without hurting correctness, but in my experiment it effects performance a little bit.

@samkitjain
Copy link

samkitjain commented Sep 19, 2024

@hgyhungry : Are you assuming B is saved in transposed form ?

If yes, that would explain why X axis is mapped to vertical for you and Y axis is mapped horizontal. I bumped into same discrepancy as @KeePLeArNinggG . We are visualizing A (sparse matrix) and B(dense matrix) in non transposed state. Which is why we both are expecting :

line 105 : int cid = (blockIdx.x<<5)+threadIdx.x;
line 106 : int rid = blockDim.y*blockIdx.y+threadIdx.y;

But if B is saved in transposed form then warp will be placed on vertical axis and thread block moves horizontally in B. Is this how you are visualizing it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants