SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

hammer-ai · 2024-09-11T21:46:22Z

Hi there, I was reading and saw:

Enabling flash attention reduces memory usage by at least 400 MB. At the moment, it is not supported when CUBLAS is enabled because the kernel implementation is missing.

But I'm curious, would it make sense to set -DSD_FLASH_ATTN=ON for the Mac, Linux, and other non-CUBLAS builds:

          - build: "noavx"
            defines: "-DGGML_AVX=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF -DSD_BUILD_SHARED_LIBS=ON"
          - build: "avx2"
            defines: "-DGGML_AVX2=ON -DSD_BUILD_SHARED_LIBS=ON"
          - build: "avx"
            defines: "-DGGML_AVX2=OFF -DSD_BUILD_SHARED_LIBS=ON"
          - build: "avx512"
            defines: "-DGGML_AVX512=ON -DSD_BUILD_SHARED_LIBS=ON"
          - build: "cuda12"

Thanks!

The text was updated successfully, but these errors were encountered:

Green-Sky · 2024-09-12T08:11:24Z

That option enables code that no longer works/exists. You can check #386 for more info.

hammer-ai closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

hammer-ai commented Sep 11, 2024

Green-Sky commented Sep 12, 2024

SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

SD_FLASH_ATTN for Mac, Linux, and non-CUBLAS Windows builds? #407

Comments

hammer-ai commented Sep 11, 2024

Green-Sky commented Sep 12, 2024