error for running accuracy_test.py in MoA_kernel and validation code in MoA #1

youyu-2024 · 2025-01-24T03:18:44Z

python 3.10.16
torch 2.2.0
print(torch.version) 2.2.0+cu121
nvcc --version cuda_12.4
torchvision 0.17.0
flashinfer 0.1.5
transformers 4.44.2

I follow the instructions in MoA and MOA-kernel to configure environment, but I got some issues described below:

When running accuracy_test.py in MOA-kernel, I encounter the following error:
ImportError: cannot import name '_prepare_4d_causal_attention_mask_for_sdpa' from 'transformers.models.llama.modeling_llama'
When ruuning the code in MoA, the "Calibration Dataset Generation", "Profile" and "optimization" can run successfully, but not for "validation". Running scripts/pipeline/perplexity_evaluate.py reported "ModuleNotFoundError: No module named 'flash_attn'", then after installing this package (flash_attn 2.5.8) and ran again, it was reported "MoA/models/llama/modeling_llama.py, line 221, in forward
self.num_key_value_groups == 1
AssertionError: only support one key value group now, but got 4"

Was not the MoA-kernel successfully installed? And how could I solve the problems? :)

fuvty · 2025-01-24T03:27:54Z

Hi @youyu-2024 .
For issue 1, it seems that it's an issue of transformers installation. Can you check whether you have installed the correct version of transformers? If not, you can run pip install transformers==4.44.2. We suggest you install the requirements.txt from the MoA repo before installing the kernel.
For issue 2, it seems that you are using a GQA model, not an MHA model. Please follow MoA's instructions to convert it to the MHA version before compression. Alternatively, you can use our one-step compression pipeline, python scripts/pipeline/main.py --model_path X --model_name Y --is_gpa (remember to add is_gqa. If this is not the case, please let us know.

youyu-2024 · 2025-01-24T05:13:26Z

@fuvty For issue 1, I actually install the requirements.txt from the MoA repo before installing the kernel. The version of transformers is definitely 4.44.2, (other packages versions are partly listed as python 3.10.16, torch 2.2.0, print(torch.version) 2.2.0+cu121, nvcc --version cuda_12.4, torchvision 0.17.0, flashinfer 0.1.5, transformers 4.44.2).

For issue 2, now I ran with your example plan as "CUDA_VISIBLE_DEVICES=0 python scripts/evaluate/retrieval_evaluate.py --model_name lmsys/vicuna-7b-v1.5-16k --moa_config examples/lmsys--vicuna-7b-v1.5-16k/moa_alpha_beta.json --output_dir output/lmsys--vicuna-7b-v1.5-16k/evaluate/retrieval --length_level 8", but the error is like this:

python: MoA_Kernel/python/include/flashinfer/attention/prefill.cuh:2381: cudaError_t flashinfer::PrefillMoADispatched(DTypeQ*, DTypeKV*, DTypeKV*, long int*, long int*, long int*, uint8_t*, DTypeOut*, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, int32_t, float, cudaStream_t) [with unsigned int HEAD_DIM = 128; bool ALLOW_FP16_QK_REDUCTION = false; flashinfer::MaskMode MASK_MODE = flashinfer::MaskMode::kCausal; DTypeQ = __half; DTypeKV = __half; DTypeOut = __half; cudaError_t = cudaError; uint8_t = unsigned char; uint32_t = unsigned int; int32_t = int; cudaStream_t = CUstream_st*]: Assertion `min(max_num_frags_z_smem, max_num_frags_z_reg) == 4' failed.
Aborted (core dumped)

fuvty · 2025-01-27T15:29:35Z

@youyu-2024 Hi. Sorry that it takes us a little time to identify the issue.
For issue 1, could you try installing transformers==4.36.2, just for the kernel test (accuracy_test.py). If it passes, the kernel installation should be fine and you can then use 4.44.2 for MoA. We will update the test script to adapt to 4.44.2;

It will be best to proceed to issue 2 after we verify the successful installation. Besides, please provide the complete error message with the full call stack.

fuvty assigned fuvty and jason-huang03 Jan 24, 2025

fuvty added the help wanted Extra attention is needed label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error for running accuracy_test.py in MoA_kernel and validation code in MoA #1

error for running accuracy_test.py in MoA_kernel and validation code in MoA #1

youyu-2024 commented Jan 24, 2025

fuvty commented Jan 24, 2025

youyu-2024 commented Jan 24, 2025

fuvty commented Jan 27, 2025

error for running accuracy_test.py in MoA_kernel and validation code in MoA #1

error for running accuracy_test.py in MoA_kernel and validation code in MoA #1

Comments

youyu-2024 commented Jan 24, 2025

fuvty commented Jan 24, 2025

youyu-2024 commented Jan 24, 2025

fuvty commented Jan 27, 2025