Skip to content

Latest commit

 

History

History
57 lines (39 loc) · 2.06 KB

README.md

File metadata and controls

57 lines (39 loc) · 2.06 KB

Rectified Rotary Position Embeddings (ReRoPE)

Using ReRoPE, we can more effectively extend the context length of LLM without the need for fine-tuning.

Blog

Idea

Results

Calculated the loss on llama2-13b with samples_15k.jsonl:

Method loss
RoPE-4k(original llama2-13b) 1.4967
RoPE-8k(original llama2-13b) 8.8615
NTK-RoPE-4k(not dynamic) 1.6081
NTK-RoPE-8k(not dynamic) 1.5417
NTK-RoPE-16k(not dynamic) 1.5163
ReRoPE-w1024-4k 1.4996
ReRoPE-w1024-8k 1.4267
ReRoPE-w1024-16k 1.4001

ReRoPE's performance at training length (4k) has hardly decreased, and it possesses the ideal property of "longer context, lower loss".

Usage

Dependency: transformers 4.31.0

Run python test.py to test chatting or run python eval_loss.py to calculate loss with llama2.

From here and here, we can see what modifications ReRoPE/Leaky ReRoPE has made compared to the original llama implementation.

Other

Triton Implementation of ReRoPE: https://gist.github.com/chu-tianxiang/4307937fd94b49c75b61a6967716bae9

Cite

@misc{rerope2023,
  title={Rectified Rotary Position Embeddings},
  author={Jianlin Su},
  year={2023},
  howpublished={\url{https://github.com/bojone/rerope}},
}

Communication

QQ discussion group: 67729435, for WeChat group, please add the robot WeChat ID spaces_ac_cn