g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values. #1395

NVIDIA-JerryChen · 2024-12-18T06:15:04Z

https://github.com/Dao-AILab/flash-attention/blob/main/csrc/flash_attn/src/flash_fwd_kernel.h#L267-L269
should be modified to
flash::copy<Is_even_MN, Is_even_K, /Clear_OOB_MN=/true>(
gmem_tiled_copy_QKV, tKgK(_, _, _, n_block), tKsK, tKVcKV, tKVpKV,
binfo.actual_seqlen_k - n_block * kBlockN);

clear only V is unsafe because NAN*0 = NAN. In IEEE754 standard, NaN has propagation.
If the default value of SMEM is NAN, it will result in the output O also containing NAN values.
this issue occurred during my saturation test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values. #1395

g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values. #1395

NVIDIA-JerryChen commented Dec 18, 2024

g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values. #1395

g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values. #1395

Comments

NVIDIA-JerryChen commented Dec 18, 2024