-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about merge-path SpMM implementation #10
Comments
Hi shadow150519,
This is not mentioned in the paper, but generally the look-up is done as follows i.e. This is how that part works. For row_ind, you can think of it as threadIdx.x. For end, it is only important for the last Block, because that is the only Block where you are limited by how many nnz there are. In this example, the last Block needs to check that no entry exceeds 2312.
On page 12, I mention that due to register limitations, in practice I had to keep T way too low i.e. T = 1 to keep the register count reasonable and not overflow.
Each thread being responsible for T work is okay, because that gets resolved within each Block by doing a segmented reduce at the Block-level using shared memory. However, your point does become an issue across Blocks. That's why on line 22, [1] MergePath paper: https://web.cs.ucdavis.edu/~amenta/f15/GPUmp.pdf Hope that answers your questions! |
Thank for your answer!I still have some questions.
I can't find the offset array in this function PartitionCsrSegReducePrealloc, so how to get the offset array? what's more, what does tb means? ("tb", po::value<int>()->default_value(32), "B slab width") |
Hello, I am reading your paper and code and I have some problems about the merge-path SpMM.
(1) In my understanding,
limit[i]
meaning paritition i should start from row limit[i] of A, and end in row limit[i+1], but how do I find which element it should start, I can't understand what doesend
androw_ind
mean in your alg.(2)what is the granularity of merge-path SpMM? In row-split a wrap in responsible for a row in A. In your paper you say a thread is responsible for T works. Does T work means T nnz in A?
(3) How to finally reduce the partial sum. Since each thread is responsible for T work which might come from different rows in A?
The text was updated successfully, but these errors were encountered: