Skip to content

[ODK] RNS Dixon parallelized benchmarks

A. Breust edited this page Jun 25, 2019 · 15 revisions

2019-06-25

These runs have been done with the following setup:

  • Nothing within the Init is done in parallel.
    • COULD HAVE PARALLELIZATION ON THE COMPUTATION OF INVERSES

  • Parallel for the euclidian division and apply of inverses.
  • Parallel FGEMM for residues update with FFLAS::ParSeqHelper::Compose<RNSParallel, FGEMMSequential>(4, 4)
    • COULD BE IMPROVED BY NOT HARDCODING ANYTHING

  • R[j] <= R[j] / pj is not parallel
    • FIND OUT WHY IT IS NOT WORKING SIMPLY

  • fconvert_rns to get R in ZZ is done on matrix.

Naming conventions:

  • PRIMES: The number of primes used for RNS Dixon. Basically how much we can parallelized.
  • D_LIFT: Classic Dixon lifting + RatRecon
  • D_IT: Classic Dixon number of iterations
  • R_IT: RNS Dixon number of iterations
  • R_INIT: RNS Dixon precomputing inverses mod all pj
  • R_LIFT: RNS Dixon lifting accumulations
  • R_CRTP: RNS Dixon CRT all progress()
  • R_CRT: RNS Dixon CRT result()
  • R_RAT: RNS Dixon rational reconstruction

HPAC (OpenBLAS)

DIM BITSIZE PRIMES D_IT D_LIFT R_IT R_INIT R_LIFT R_CRTP R_CRT R_RAT
100 100 256 800 .3171 4 .6772 .2350 .0379 .0128 .0248
100 100 128 805 .3123 8 .3398 .2805 .0285 .0128 .0250
100 100 64 807 .3143 16 .1727 .3388 .0237 .0128 .0248
100 100 32 802 .3106 31 .0890 .5142 .0193 .0125 .0242
100 100 16 798 .3036 61 .0477 .9056 .0161 .0116 .0243
100 100 8 801 .3153 121 .0273 1.5501 .0125 .0094 .0242
100 100 4 822 .3083 242 .0172 2.9304 .0105 .0060 .0237
100 100 2 817 .3133 489 .0118 5.8470 .0086 .0001 .0236
100 100 1 813 .3108 978 .0095 11.5726 .0001 .0001 .0238
100 100 32 802 .3106 31 .0890 .5142 .0193 .0125 .0242
100 200 32 1575 .6803 60 .0915 1.0971 .0460 .0348 .0751
200 100 32 1631 1.2162 61 .3204 1.3498 .0915 .0721 .1044
300 100 32 2421 3.0914 91 .7353 2.6728 .2338 .1972 .2580
400 100 32 3250 14.3534 122 2.0951 10.9242 .6531 .6323 .5688

Boree (BLIS)

DIM BITSIZE PRIMES D_IT D_LIFT R_IT R_INIT R_LIFT R_CRTP R_CRT R_RAT
100 100 32 807 .1426 31 .1494 .1103 .0105 .0066 .0135
100 200 32 1571 .3465 60 .1532 .3258 .0249 .0182 .0408
200 100 32 1608 .6997 61 .4154 .5350 .0492 .0371 .0557
300 100 32 2452 1.8967 91 .7970 1.2881 .1215 .1020 .1354
400 100 32 3227 5.2680 122 1.3045 2.5307 .2442 .2108 .2635
1000 100 32 8109 90.4061 307 7.5955 25.1093 2.1447 1.8424 2.1459

One D_LIFT details:

DIM BITSIZE PRIMES DIV and c = A^{-1} r FGEMM R <= R - Ac R <= R / p CONVERT r <= Q + R
1000 100 32 0.0219879 0.0230379 0.014802 0.0106769
Clone this wiki locally