You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your wonderful work!
I am trying to understand matrix A's layout in shared memory. I think A's shape is (16 * thread_m_blocks) * (16 * thread_k_blocks) in shared memory for every thread block originally, but the following code (line 287 in marlin_cuda_kernel.cu) makes me confused. Why a_sh_rd_delta_o is calculated like this? I'm looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Thanks for your wonderful work!
I am trying to understand matrix A's layout in shared memory. I think A's shape is
(16 * thread_m_blocks) * (16 * thread_k_blocks)
in shared memory for every thread block originally, but the following code (line 287 inmarlin_cuda_kernel.cu
) makes me confused. Whya_sh_rd_delta_o
is calculated like this? I'm looking forward to your reply.The text was updated successfully, but these errors were encountered: