We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.请深入卷积算子的运算过程,挖掘如下可能的性能点 1.1 并行性 1.2 高效 IO 1.3 高效计算 2.考虑如下的功能点 2.1 后融合激活操作或者下一个算子 2.2 前融合前一个算子 3.提供多种计算内核的选项,例如 cuda 平台的 cuda core / tensor core;bang 平台的 张量核 / 卷积核。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
1.请深入卷积算子的运算过程,挖掘如下可能的性能点
1.1 并行性
1.2 高效 IO
1.3 高效计算
2.考虑如下的功能点
2.1 后融合激活操作或者下一个算子
2.2 前融合前一个算子
3.提供多种计算内核的选项,例如 cuda 平台的 cuda core / tensor core;bang 平台的 张量核 / 卷积核。
The text was updated successfully, but these errors were encountered: