We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我了解到OpenPPL-LLM有执行四个关键的算子融合:1)我们将合并残差链接与归一化层之间的操作全部融合,这将减少数次对全局内存的访问;2)我们将横向合并Q, K, V矩阵乘,更大规模的矩阵乘将更充分地利用算力;3)我们将合并Rotary Embedding的相关操作,这将减少数次全局内存访问;4)我们将使用flash attention这一较高性能的实现。 请问一些融合的实现在代码中的那个位置呢?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我了解到OpenPPL-LLM有执行四个关键的算子融合:1)我们将合并残差链接与归一化层之间的操作全部融合,这将减少数次对全局内存的访问;2)我们将横向合并Q, K, V矩阵乘,更大规模的矩阵乘将更充分地利用算力;3)我们将合并Rotary Embedding的相关操作,这将减少数次全局内存访问;4)我们将使用flash attention这一较高性能的实现。
请问一些融合的实现在代码中的那个位置呢?
The text was updated successfully, but these errors were encountered: