Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error for running accuracy_test.py in MoA_kernel and validation code in MoA #1

Open
youyu-2024 opened this issue Jan 24, 2025 · 3 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@youyu-2024
Copy link

python 3.10.16
torch 2.2.0
print(torch.version) 2.2.0+cu121
nvcc --version cuda_12.4
torchvision 0.17.0
flashinfer 0.1.5
transformers 4.44.2

I follow the instructions in MoA and MOA-kernel to configure environment, but I got some issues described below:

  1. When running accuracy_test.py in MOA-kernel, I encounter the following error:
    ImportError: cannot import name '_prepare_4d_causal_attention_mask_for_sdpa' from 'transformers.models.llama.modeling_llama'

  2. When ruuning the code in MoA, the "Calibration Dataset Generation", "Profile" and "optimization" can run successfully, but not for "validation". Running scripts/pipeline/perplexity_evaluate.py reported "ModuleNotFoundError: No module named 'flash_attn'", then after installing this package (flash_attn 2.5.8) and ran again, it was reported "MoA/models/llama/modeling_llama.py, line 221, in forward
    self.num_key_value_groups == 1
    AssertionError: only support one key value group now, but got 4"

Was not the MoA-kernel successfully installed? And how could I solve the problems? :)

@fuvty fuvty added the help wanted Extra attention is needed label Jan 24, 2025
@fuvty
Copy link
Member

fuvty commented Jan 24, 2025

Hi @youyu-2024 .
For issue 1, it seems that it's an issue of transformers installation. Can you check whether you have installed the correct version of transformers? If not, you can run pip install transformers==4.44.2. We suggest you install the requirements.txt from the MoA repo before installing the kernel.
For issue 2, it seems that you are using a GQA model, not an MHA model. Please follow MoA's instructions to convert it to the MHA version before compression. Alternatively, you can use our one-step compression pipeline, python scripts/pipeline/main.py --model_path X --model_name Y --is_gpa (remember to add is_gqa. If this is not the case, please let us know.

@youyu-2024
Copy link
Author

@fuvty For issue 1, I actually install the requirements.txt from the MoA repo before installing the kernel. The version of transformers is definitely 4.44.2, (other packages versions are partly listed as python 3.10.16, torch 2.2.0, print(torch.version) 2.2.0+cu121, nvcc --version cuda_12.4, torchvision 0.17.0, flashinfer 0.1.5, transformers 4.44.2).

For issue 2, now I ran with your example plan as "CUDA_VISIBLE_DEVICES=0 python scripts/evaluate/retrieval_evaluate.py --model_name lmsys/vicuna-7b-v1.5-16k --moa_config examples/lmsys--vicuna-7b-v1.5-16k/moa_alpha_beta.json --output_dir output/lmsys--vicuna-7b-v1.5-16k/evaluate/retrieval --length_level 8", but the error is like this:

python: MoA_Kernel/python/include/flashinfer/attention/prefill.cuh:2381: cudaError_t flashinfer::PrefillMoADispatched(DTypeQ*, DTypeKV*, DTypeKV*, long int*, long int*, long int*, uint8_t*, DTypeOut*, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, uint32_t, int32_t, float, cudaStream_t) [with unsigned int HEAD_DIM = 128; bool ALLOW_FP16_QK_REDUCTION = false; flashinfer::MaskMode MASK_MODE = flashinfer::MaskMode::kCausal; DTypeQ = __half; DTypeKV = __half; DTypeOut = __half; cudaError_t = cudaError; uint8_t = unsigned char; uint32_t = unsigned int; int32_t = int; cudaStream_t = CUstream_st*]: Assertion `min(max_num_frags_z_smem, max_num_frags_z_reg) == 4' failed.
Aborted (core dumped)

@fuvty
Copy link
Member

fuvty commented Jan 27, 2025

@youyu-2024 Hi. Sorry that it takes us a little time to identify the issue.
For issue 1, could you try installing transformers==4.36.2, just for the kernel test (accuracy_test.py). If it passes, the kernel installation should be fine and you can then use 4.44.2 for MoA. We will update the test script to adapt to 4.44.2;

It will be best to proceed to issue 2 after we verify the successful installation. Besides, please provide the complete error message with the full call stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants