-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uadk supports heterogeneous computing #658
base: develop
Are you sure you want to change the base?
Conversation
uadk: add some bugfix
This reverts commit 3fc344a.
Unify the software ctx and hardware ctx in uadk and merge them on the scheduler. Realize the function of software and hardware calculation together Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the corresponding scheduler needs to add a new scheduling solution. Signed-off-by: Longfang Liu <[email protected]>
After adapting the new heterogeneous hybrid acceleration function. The initialization of the device driver requires adaptation updates. In addition, the instruction acceleration algorithm driver needs to fully adapt to the synchronous and asynchronous mode of the uadk framework. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the aead algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the hash-agg algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the cipher algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the comp algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the dh algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the digest algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the ECC algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After the uadk framework updates the heterogeneous scheduling function, the internal implementation functions of the rsa algorithm need to be adapted and modified. Signed-off-by: Longfang Liu <[email protected]>
After adapting to uadk's heterogeneous scheduling framework, all uadk algorithms have completed functional adaptation. After the adaptation is completed, the old functions need to be deleted. Signed-off-by: Longfang Liu <[email protected]>
Completed the update of uadk test tool function to adapt to heterogeneous scheduling function Signed-off-by: Longfang Liu <[email protected]>
有单侧ce的数据么 |
单侧ce的数据如下: SM3 1024B CE Performance(MB/s) |
硬件性能偏低,可有测过 --thread 8 --ctxnum 8? |
use both hardware acceleration and instruction acceleration. It is used to use instructions to continue to improve
and accelerate business performance after the hardware business is full. And it can automatically adapt to a variety
of acceleration devices.
The current patchset was developed for this purpose. And it has been fully adapted to all algorithm types of uadk.
hybrid acceleration is significantly higher, and the acceleration effect has been significantly improved.
sm3 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
tds------init1(HW)-----init2(HW + CE)----increase
1-----------393.3--------437.1-------------11.14%
2----------762.1---------823.4------------8.04%
4----------1508.4-------1564.1------------3.69%
8----------3007.4------3074.9-----------2.24%
16---------4851.8-------5429.2-----------11.90%
32--------4854.1-------8698.8------------79.21%
sm4 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --async --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
tds-------init1(HW)----init2(HW + CE)---------increase
1-------------461----------1482.5---------------221.58%
2------------914----------2575.4---------------181.77%
4-----------1699.9--------4737.6---------------178.70%
8-----------3301.5--------7327.8---------------121.95%
16----------5837.5--------9737.4---------------66.81%
32----------8897.7-------10432.4--------------17.25%
tds-------init1(HW)----init2(HW + CE)---------increase
1-----------1368.3--------1683.9---------------23.07%
2------------2652---------3235.5---------------22.00%
4-----------3979.5--------5094.5---------------28.02%
8-----------6667.7---------8587----------------28.79%
16----------8900.9-------11067.8---------------24.34%
32----------8905.9-------10209.1--------------14.63%