Uneven performance of `blst_p1s_mult_pippenger` #235

chfast · 2024-10-31T12:22:38Z

When benchmarking blst_p1s_mult_pippenger I noticed sudden increases in performance at number of points: 64, 128, 256 and further.

The text was updated successfully, but these errors were encountered:

dot-asm · 2024-11-01T11:12:57Z

And what's the issue? :-) But on a more serious note, the keyword is that the tangent becomes more and more moderate, and it depends on how you slice scalars depending on amount of inputs, which is a balancing act. The "scalar-slicing" procedure is prone to rounding errors, which is why the curve is bound to have breaks. Now with this in mind, what's the issue? That the breaks are too big?

chfast · 2024-11-04T13:51:05Z

I just wanted to notify that decision how to slice scalars depending on the number of inputs may be improved. E.g. currently, for our data it is faster to compute MSM for 65 points than for 55.

dot-asm · 2024-11-05T10:02:01Z

For the record, performance for such small amounts of inputs has never been subject to such close scrutiny, let alone single-thread performance[!]. The latter is because even single-board computers are multi-core this time and day. But anyway, try to modify pippenger_window_size() in src/multi_scalar.c by adding npoints += 8; after size_t wbits; declaration.

chfast · 2024-12-09T13:37:06Z

I tried the suggestion (npoints += 8) but it's effect is limited.

I got the best results by:

increasing the bits by 1 from what is originally computed,

  size_t r = wbits>12 ? wbits-3 : (wbits>4 ? wbits-2 : (wbits ? 2 : 1));
  return r + 1;

decreasing the threshold of the fallback to mult_wbits by 4, for p1 this is 64 → 16
```
if ((npoints * sizeof(ptype##_affine) * 8 * 3) <= SCRATCH_LIMIT / 4)
```

Fixes supranational#235.

dot-asm · 2024-12-18T16:53:32Z

Thanks! Could you double-check #246?

dot-asm added a commit to dot-asm/blst that referenced this issue Dec 18, 2024

multi_scalar.c: fine-tune MSM for small amounts of inputs.

eb3ba6b

Fixes supranational#235.

dot-asm linked a pull request Dec 18, 2024 that will close this issue

multi_scalar.c: fine-tune MSM for small amounts of inputs. #246

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uneven performance of `blst_p1s_mult_pippenger` #235

Uneven performance of `blst_p1s_mult_pippenger` #235

chfast commented Oct 31, 2024

dot-asm commented Nov 1, 2024

chfast commented Nov 4, 2024

dot-asm commented Nov 5, 2024

chfast commented Dec 9, 2024 •

edited

Loading

dot-asm commented Dec 18, 2024

Uneven performance of blst_p1s_mult_pippenger #235

Uneven performance of blst_p1s_mult_pippenger #235

Comments

chfast commented Oct 31, 2024

dot-asm commented Nov 1, 2024

chfast commented Nov 4, 2024

dot-asm commented Nov 5, 2024

chfast commented Dec 9, 2024 • edited Loading

dot-asm commented Dec 18, 2024

Uneven performance of `blst_p1s_mult_pippenger` #235

Uneven performance of `blst_p1s_mult_pippenger` #235

chfast commented Dec 9, 2024 •

edited

Loading