Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using Fast Fourier Transform for polynomial multiplication. #13

Open
afck opened this issue Aug 27, 2018 · 11 comments
Open

Consider using Fast Fourier Transform for polynomial multiplication. #13

afck opened this issue Aug 27, 2018 · 11 comments

Comments

@afck
Copy link
Collaborator

afck commented Aug 27, 2018

We're currently doing polynomial multiplication and interpolation the naive way (also interpolation, and for a single value), which is O(n²). With Fast Fourier Transform (FFT), multiplication (but probably not interpolation, in general?) should be possible in O(n log(n)). Let's try to estimate the actual count of arithmetic operations first, to make sure it's worth doing this in practice, and for all network sizes.

@vkomenda
Copy link
Contributor

I'll have a look if this can be done using rustfft. Alternatively, we can use an external FFT library via bindings.

@vkomenda vkomenda self-assigned this Aug 27, 2018
@afck
Copy link
Collaborator Author

afck commented Aug 27, 2018

That would be great, but I doubt that it's possible: We need to do FFT in a finite field and most implementations seem to be written for floating point numbers only.

@vkomenda
Copy link
Contributor

Yes. In the short term we could bind libqfft. In the long term we could write the FFT functionality in Rust.

Here is one more paper on FFT in finite fields.

@afck
Copy link
Collaborator Author

afck commented Aug 27, 2018

libqfft doesn't seem to expose a pure C API. Creating a wrapper for a C++-only library could be tricky and messy. Might be easier to just Rewrite It In Rust™.

@c0gent
Copy link
Contributor

c0gent commented Aug 27, 2018

Let me know when when the time comes to implement this on the GPU :)

@afck
Copy link
Collaborator Author

afck commented Aug 27, 2018

…and for the instance in src/lib.rs it's not even in the field itself but in the group G2. The Fourier transform on finite groups Wikipedia article still mentions "Fast Fourier Transform", so maybe this can still be done in O(n log(n))? I have no idea whether a finite-field FFT library would help us with that, though.

Edit: Actually, G2 is a vector space over the field. Maybe the FFT algorithms for the field can be applied to it, too? (It's actually isomorphic to a finite field, of course, but multiplication is not effectively computable, I think.)

@afck
Copy link
Collaborator Author

afck commented Aug 29, 2018

I read up on it a bit. Here's my understanding so far (most of this is just from various Wikipedia articles):

Fourier Transform

Let 𝔽 be a field and 𝕍 a vector space over it, N a natural number and α ∈ 𝔽 a primitive N-th root of unity, i.e. αk = 1 for k = N, but not for 0 < k < N. The Fourier transform 𝕍N → 𝕍N is the linear map defined by the matrix W, where Wi,j = αij. Its inverse is given by W-1i,j = α-ij / N. (Note that I generalize matrix multiplication a bit here: AB can be defined in exactly the same way if the entries of A are in 𝔽, and the entries in B are in 𝕍.)

We can interpret each a ∈ 𝕍N as the coefficients of a polynomial
fa(x) = a0 + a1 x + a2 x2 + … + aN - 1 xN - 1.
Then the Fourier transform can be expressed as
W a = (fa0), fa1), fa2), …, faN - 1))T,
and its inverse as
W-1 a = (fa0), fa-1), fa-2), …, fa-(N - 1)))T / N.

So the Fourier transform of a is the vector of values of fa at the N-th roots of unity. And the inverse Fourier transform of a are the coefficients of the polynomial whose values at the N-th roots of unity are given by a: In that sense, W is evaluation and W-1 is interpolation.

Fast Fourier Transform

Let W' be the Fourier transform for 2N instead of N, with some 2N-th root of unity β such that β2 = α. Let c ∈ 𝕍2N, and let a, b ∈ 𝕍N be the even- resp. odd-numbered entries of c (starting at 0). Then
(W' c)i = fci) = fai) + βi fbi) = (W a)i%N + βi (W b)i%N.
So for i < N, we have
(W' c)i = fci) = (W a)i + βi (W b)i and
(W' c)i+N = fci+N) = (W a)i + βN βi (W b)i = (W a)i - βi (W b)i,
because βN = -1 is the unique non-trivial square root of 1.

So the Fourier transform in 𝕍2N can be computed in linear time from two Fourier transforms in 𝕍N! That means if N is a power of 2, the Fourier transform can be computed in O(N log(N)). (But the same trick also works for factors other than 2, of course.)

The pairing crate knows for which powers N of 2 there are N-th roots of unity, and BLS12-381 was designed to have 2s-th roots of unity for large s.

Fast Multiplication

Let fa and fb be polynomials with degrees deg fa + deg fb < N, i.e. such that there is a c ∈ 𝕍N with fc = fa fb. Naively, computing c, i.e. computing the product of the two polynomials, takes O(N2) time.

But W c is just the pointwise product of W a and W b (because these represent values of the polynomials) and computable in O(N). So (at least for large N) transforming, multiplying pointwise, and transforming back is faster than naive multiplication, namely O(N log(N))!

Our Implementation

That is already useful: Maybe we should have two kinds of structures for our polynomials (i.e. 𝔽 = 𝕍 = Fr) and commitments (which in the above sense are also polynomials: the groups, e.g. 𝕍 =G2, are actually vector spaces over 𝔽 = Fr, I think; but it might be useless because commitments can't be multiplied anyway), one in the coefficient representation and one in the value representation, i.e. the latter would be the Fourier transform of the former. The former allows evaluation and addition in constant time, the latter would allow multiplication and addition in constant time (if the bound N for the degree is known), and conversions between the two would take O(N log(N)) time.

I have no idea whether it's worth it, i.e. whether that would speed up our arithmetic for N > 5 or only for N > 100… we'll need to either try it out or at least count how many additions and multiplications the transform would actually require. And there are also other fast Fourier transform algorithms that might do better in practice than the one I described.

Interpolation in General

I don't understand yet whether and how this can be applied to interpolation in general: For threshold decryption and signatures we need to interpolate given values in other places than the N-th roots of unity. E.g. libfqfft seems to also only allow fast interpolation if the sample points are in a specific domain: roots of unity or arithmetic or geometric progressions.

@afck afck changed the title Consider using Fast Fourier Transform for polynomial interpolation. Consider using Fast Fourier Transform for polynomial multiplication. Aug 29, 2018
@afck afck mentioned this issue Aug 29, 2018
@afck
Copy link
Collaborator Author

afck commented Aug 29, 2018

Key Share Computation

We should also use f(αk) for the k-th key share, since these values can all be computed in a single Fourier transform in O(N log(N)) time, whereas computing them separately takes O(N2).
(Currently it's f(k), i.e. f(k · 1𝔽).)

However, N would always have to be a suitable number, for which a root of unity and a good FFT algorithm is known: probably a power of two. So it would be the smallest power of two greater than the number of nodes.

If we require that N is strictly greater than the number of nodes, we could also make the master key f(1) instead of f(0), so it's part of the Fourier transform, too.

@afck
Copy link
Collaborator Author

afck commented Aug 30, 2018

The bellman crate's FFT implementation follows exactly the algorithm described above. It's a bit hard to read, though: It swaps the elements in such a way that the bits of an element's index get flipped, so that in each step of the iteration, W a and W b are consecutive contiguous slices.

@afck afck self-assigned this Aug 30, 2018
@afck
Copy link
Collaborator Author

afck commented Sep 3, 2018

I wrote a crude FFT implementation in the afck-fft branch and the benchmarks show that even for N = 40, at least a single polynomial multiplication is three times as fast if done naively.

It might still turn out to be worth it, of course:

  • Maybe for general polynomial interpolation we only have to do a single inverse Fourier transform at the end, and the multiplications before that could be sped up by being performed on PolyVals. (On the other hand, most of them don't involve two high-degree factors anyway…)
  • The implementation still does lots of unnecessary cloning and roots-of-unity-squaring. Would be interesting to see how a few optimizations would change the numbers.
  • As mentioned above, key share computation could also be sped up.

Anyway, I'm unassigning this for now; it's not a low-hanging fruit, at least.

@afck afck removed their assignment Sep 3, 2018
@vkomenda
Copy link
Contributor

vkomenda commented Sep 3, 2018

For proper benchmarking, it would be required to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants