Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch binding #280

Open
jgosmann opened this issue Mar 16, 2021 · 4 comments
Open

Batch binding #280

jgosmann opened this issue Mar 16, 2021 · 4 comments

Comments

@jgosmann
Copy link
Collaborator

By @arvoelke in this comment:

This is quite a bit outside the scope of this PR, but while I'm thinking about it I have been wondering how this interface could work efficiently with batches of vectors and multiple consecutive bindings (being done offline without neural networks). I have been doing this quite a bit lately to prepare datasets comprised of [S]SPs and found that it's much faster (e.g., 3-5x faster) to apply just a single rfft to an entire batch of vectors, do all of the bindings that are needed while staying in the Fourier domain, and then apply a single irfft to the batch of vectors only as a final step. I don't really see a way to reuse this code in this context, but it would be nice to have that sort of capability centralized here without replicating parts of the HRR algebra code, so it's maybe something to think about.

@jgosmann
Copy link
Collaborator Author

Would you envision something were multiple SPs can be bound at once like so:

base_sp.bind(first_sp, second_sp, third_sp, ...)

Or rather something were you can convert a vector to the Fourier space, perform the operations there and then convert back?

in_fourier = base_sp.to_fourier()
in_fourier.bind(first_in_fourier)
in_fourier.bind(second_in_fourier)
in_fourier.bind(third_in_fourier)
bound_sp = in_fourier.to_sp()

I suppose the first variant would be easier to implement, whereas the second variant gives more flexibility.

The examples are of course rough sketches how the API could look like. Actual names would likely be different.

@arvoelke
Copy link
Contributor

arvoelke commented Mar 17, 2021

I was also envisioning something like you might have the following batches/tensors of SPs:

A : (n, d)
B : (n, d)
C : (1, d)

where d is the SP dimensionality, and n is the batch size.

And then want to do stuff like A * (B ** 2.5) * C with the usual NumPy semantics (e.g., C is broadcast across the batch axis).

With the HRR algebra, that can be done by staying in the Fourier domain the entire time, and using NumPy operations that are vectorized across the batch (e.g., np.fft.rfft(..., axis=-1)). Similarly batches of "positive" unitary vectors can be generated efficiently by using Euler's formula np.exp(1j * phi) where |phi| < pi and setting the DC/Nyquist coefficients to 1 across the batch axis (no FFT required). It's not a lot of code to write out manually on a case-by-case basis, but it involves a bit of duplicating what's in NengoSPA.

@jgosmann
Copy link
Collaborator Author

Something like this already works, you can put SemanticPointer instances into a NumPy array and use these:

A = np.array([spa.SemanticPointer(...), spa.SemanticPointer(...), ...])
B = np.array([spa.SemanticPointer(...), spa.SemanticPointer(...), ...])
C = spa.SemanticPointer(...)  # or alternatively np.array([spa.SemanticPointer(...)])

result = A * (B ** 2.5) * C  # note the ** operator is not yet implemented, but should work this way once it is

But I suppose you're also thinking of an optimization to avoid doing the IFFT followed with the FFT again in-between operations? For that I could think of two approaches:

  • Trying to handle this transparently by adding the capability to store each SemanticPointer in either the standard vector space or FFT space and only do a conversion to the respective other representation only when required by an operation.
  • Add an API to do operation explicitly in the Fourier space. Different options might be viable, just spitballing:
    • with spa.useFourierSpace:
          result = A * (B ** 2.5) * C
    • result = spa.inFourierSpace("A * (B ** 2.5) * C") (hm, code in strings, not sure I like that 🤔)
    • result = (A.fft() * (B.fft() ** 2.5) * C.fft()).ifft()

Similarly batches of "positive" unitary vectors can be generated efficiently by using Euler's formula np.exp(1j * phi) where |phi| < pi and setting the DC/Nyquist coefficients to 1 across the batch axis (no FFT required).

This could be implemented as additional generator in the vector_generation module, I suppose.

@arvoelke
Copy link
Contributor

arvoelke commented Apr 11, 2021

When I was mentioning the speed-up that is actually what I was comparing to. I used to do this in the above style. And it does indeed work functionally, but it's quite slow. It's much faster to batch it in NumPy because it does everything in a constant number of C routines that each do the work in parallel, as opposed to looping and sequentially invoking a Python method that then dispatches separate C routines for each SP. Here's a quick-and-dirty benchmarking example:

import numpy as np
import nengo_spa as spa

rng = np.random.RandomState(seed=0)

n, d = 1024, 256
A = rng.randn(n, d)
B = rng.randn(n, d)

sspA = np.asarray([spa.SemanticPointer(a) for a in A])
sspB = np.asarray([spa.SemanticPointer(b) for b in B])

%timeit sspC = sspA * sspB
%timeit C = np.fft.irfft(np.fft.rfft(A) * np.fft.rfft(B))

assert np.allclose(C, [c.v for c in sspC])
22.4 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.21 ms ± 999 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

So in this case it's ~10x faster to batch it in NumPy. I'm using a Conda install of numpy==1.18.5 and an AMD Ryzen 9 3900X 12-core processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants