Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问算子融合的实现主要在哪里呢? #964

Open
KindredSpirithub opened this issue Aug 14, 2024 · 0 comments
Open

请问算子融合的实现主要在哪里呢? #964

KindredSpirithub opened this issue Aug 14, 2024 · 0 comments

Comments

@KindredSpirithub
Copy link

我了解到OpenPPL-LLM有执行四个关键的算子融合:1)我们将合并残差链接与归一化层之间的操作全部融合,这将减少数次对全局内存的访问;2)我们将横向合并Q, K, V矩阵乘,更大规模的矩阵乘将更充分地利用算力;3)我们将合并Rotary Embedding的相关操作,这将减少数次全局内存访问;4)我们将使用flash attention这一较高性能的实现。
请问一些融合的实现在代码中的那个位置呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant