后续Chinese-LLama或者LLama2上会考虑用上bettertransformer吗? #791
alkaideemo
started this conversation in
Ideas
Replies: 1 comment
-
我们刚发布了Chinese-LLaMA-2,使用了FlashAttention-2技术进行训练。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
optimum在1.9.0版本后支持了llama:https://github.com/huggingface/optimum/pull/998,这样transformers版本的llama也能用scaled_dot_product_attention这个新算子,硬件支持可以调用到flash attention的kernel。
简单测试了一下,在8卡A100上训练吞吐几乎可以增加一倍,显存占用也降低了。
Beta Was this translation helpful? Give feedback.
All reactions