Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 586 Bytes

README.md

File metadata and controls

19 lines (13 loc) · 586 Bytes

This code is adapted from here. If you use this code, please consider citing the original paper.

How to Run Needle Test

export MODEL_NAME=JunxiongWang/Llama3.2-Mamba2-3B-distill
export RESULT_SAVE_PATH=Llama3.1-Mamba-distill
python -u needle_in_haystack.py --s_len 0 --e_len 65536\
    --model_provider Mamba \
    --model_path ${MODEL_NAME} \
    --test_name ${RESULT_SAVE_PATH} 

Notice that, during the distillation, we only train model with 2k context.

Here is the results

needle