Questions about matrix A's layout in shared memory. #20

HandH1998 · 2024-04-08T12:26:40Z

Thanks for your wonderful work!
I am trying to understand matrix A's layout in shared memory. I think A's shape is (16 * thread_m_blocks) * (16 * thread_k_blocks) in shared memory for every thread block originally, but the following code (line 287 in marlin_cuda_kernel.cu) makes me confused. Why a_sh_rd_delta_o is calculated like this? I'm looking forward to your reply.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about matrix A's layout in shared memory. #20

Questions about matrix A's layout in shared memory. #20

HandH1998 commented Apr 8, 2024

Questions about matrix A's layout in shared memory. #20

Questions about matrix A's layout in shared memory. #20

Comments

HandH1998 commented Apr 8, 2024