Replies: 6 comments 3 replies
-
按文献2的实现,结果不对:
|
Beta Was this translation helpful? Give feedback.
-
按文献4的实现是ok的 |
Beta Was this translation helpful? Give feedback.
-
文献1的Middle方法应该是错误的 |
Beta Was this translation helpful? Give feedback.
-
这种大并行的算法,一个很重要的优化方向就是内存的使用。如果amd64 CPU有avx2特性,大概率有AES-NI,用AES-NI+AVX2 和 用AVX2+bitsliced比较,目前来看还是AES-NI+AVX2有优势:灵活(针对加密数据长度以及加密模式),内存占用小,性能还算好。目前结论:emmansun/sm4bs#1 |
Beta Was this translation helpful? Give feedback.
-
经过持续优化,256组并行比特切片实现的性能终于超过了目前的AES-NI + AVX2实现,不过不多。 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions