-
Notifications
You must be signed in to change notification settings - Fork 28
[ODK] RNS Dixon parallelized benchmarks
A. Breust edited this page Jun 25, 2019
·
15 revisions
These runs have been done with the following setup:
- Nothing within the Init is done in parallel.
-
COULD HAVE PARALLELIZATION ON THE COMPUTATION OF INVERSES
-
- Parallel for the euclidian division and apply of inverses.
- Parallel FGEMM for residues update with
FFLAS::ParSeqHelper::Compose<RNSParallel, FGEMMSequential>(4, 4)
-
COULD BE IMPROVED BY NOT HARDCODING ANYTHING
-
-
R[j] <= R[j] / pj
is not parallel-
FIND OUT WHY IT IS NOT WORKING SIMPLY
-
-
fconvert_rns
to getR
in ZZ is done on matrix.
Naming conventions:
-
PRIMES
: The number of primes used for RNS Dixon. Basically how much we can parallelized. -
D_LIFT
: Classic Dixon lifting + RatRecon -
D_IT
: Classic Dixon number of iterations -
R_IT
: RNS Dixon number of iterations -
R_INIT
: RNS Dixon precomputing inverses mod all pj -
R_LIFT
: RNS Dixon lifting accumulations -
R_CRTP
: RNS Dixon CRT all progress() -
R_CRT
: RNS Dixon CRT result() -
R_RAT
: RNS Dixon rational reconstruction
DIM | BITSIZE | PRIMES | D_IT | D_LIFT | R_IT | R_INIT | R_LIFT | R_CRTP | R_CRT | R_RAT |
---|---|---|---|---|---|---|---|---|---|---|
100 | 100 | 256 | 800 | .3171 | 4 | .6772 | .2350 | .0379 | .0128 | .0248 |
100 | 100 | 128 | 805 | .3123 | 8 | .3398 | .2805 | .0285 | .0128 | .0250 |
100 | 100 | 64 | 807 | .3143 | 16 | .1727 | .3388 | .0237 | .0128 | .0248 |
100 | 100 | 32 | 802 | .3106 | 31 | .0890 | .5142 | .0193 | .0125 | .0242 |
100 | 100 | 16 | 798 | .3036 | 61 | .0477 | .9056 | .0161 | .0116 | .0243 |
100 | 100 | 8 | 801 | .3153 | 121 | .0273 | 1.5501 | .0125 | .0094 | .0242 |
100 | 100 | 4 | 822 | .3083 | 242 | .0172 | 2.9304 | .0105 | .0060 | .0237 |
100 | 100 | 2 | 817 | .3133 | 489 | .0118 | 5.8470 | .0086 | .0001 | .0236 |
100 | 100 | 1 | 813 | .3108 | 978 | .0095 | 11.5726 | .0001 | .0001 | .0238 |
100 | 100 | 32 | 802 | .3106 | 31 | .0890 | .5142 | .0193 | .0125 | .0242 |
100 | 200 | 32 | 1575 | .6803 | 60 | .0915 | 1.0971 | .0460 | .0348 | .0751 |
200 | 100 | 32 | 1631 | 1.2162 | 61 | .3204 | 1.3498 | .0915 | .0721 | .1044 |
300 | 100 | 32 | 2421 | 3.0914 | 91 | .7353 | 2.6728 | .2338 | .1972 | .2580 |
400 | 100 | 32 | 3250 | 14.3534 | 122 | 2.0951 | 10.9242 | .6531 | .6323 | .5688 |
DIM | BITSIZE | PRIMES | D_IT | D_LIFT | R_IT | R_INIT | R_LIFT | R_CRTP | R_CRT | R_RAT |
---|---|---|---|---|---|---|---|---|---|---|
100 | 100 | 32 | 807 | .1426 | 31 | .1494 | .1103 | .0105 | .0066 | .0135 |
100 | 200 | 32 | 1571 | .3465 | 60 | .1532 | .3258 | .0249 | .0182 | .0408 |
200 | 100 | 32 | 1608 | .6997 | 61 | .4154 | .5350 | .0492 | .0371 | .0557 |
300 | 100 | 32 | 2452 | 1.8967 | 91 | .7970 | 1.2881 | .1215 | .1020 | .1354 |
400 | 100 | 32 | 3227 | 5.2680 | 122 | 1.3045 | 2.5307 | .2442 | .2108 | .2635 |
1000 | 100 | 32 | 8109 | 90.4061 | 307 | 7.5955 | 25.1093 | 2.1447 | 1.8424 | 2.1459 |
One D_LIFT
details:
DIM | BITSIZE | PRIMES | DIV and c = A^{-1} r | FGEMM R <= R - Ac | R <= R / p | CONVERT r <= Q + R |
---|---|---|---|---|---|---|
1000 | 100 | 32 | 0.0219879 | 0.0230379 | 0.014802 | 0.0106769 |