Batch binding #280

jgosmann · 2021-03-16T18:21:31Z

This is quite a bit outside the scope of this PR, but while I'm thinking about it I have been wondering how this interface could work efficiently with batches of vectors and multiple consecutive bindings (being done offline without neural networks). I have been doing this quite a bit lately to prepare datasets comprised of [S]SPs and found that it's much faster (e.g., 3-5x faster) to apply just a single rfft to an entire batch of vectors, do all of the bindings that are needed while staying in the Fourier domain, and then apply a single irfft to the batch of vectors only as a final step. I don't really see a way to reuse this code in this context, but it would be nice to have that sort of capability centralized here without replicating parts of the HRR algebra code, so it's maybe something to think about.

jgosmann · 2021-03-16T18:27:21Z

Would you envision something were multiple SPs can be bound at once like so:

base_sp.bind(first_sp, second_sp, third_sp, ...)

Or rather something were you can convert a vector to the Fourier space, perform the operations there and then convert back?

in_fourier = base_sp.to_fourier()
in_fourier.bind(first_in_fourier)
in_fourier.bind(second_in_fourier)
in_fourier.bind(third_in_fourier)
bound_sp = in_fourier.to_sp()

I suppose the first variant would be easier to implement, whereas the second variant gives more flexibility.

The examples are of course rough sketches how the API could look like. Actual names would likely be different.

arvoelke · 2021-03-17T18:01:06Z

I was also envisioning something like you might have the following batches/tensors of SPs:

A : (n, d)
B : (n, d)
C : (1, d)

where d is the SP dimensionality, and n is the batch size.

And then want to do stuff like A * (B ** 2.5) * C with the usual NumPy semantics (e.g., C is broadcast across the batch axis).

With the HRR algebra, that can be done by staying in the Fourier domain the entire time, and using NumPy operations that are vectorized across the batch (e.g., np.fft.rfft(..., axis=-1)). Similarly batches of "positive" unitary vectors can be generated efficiently by using Euler's formula np.exp(1j * phi) where |phi| < pi and setting the DC/Nyquist coefficients to 1 across the batch axis (no FFT required). It's not a lot of code to write out manually on a case-by-case basis, but it involves a bit of duplicating what's in NengoSPA.

jgosmann · 2021-04-10T19:04:13Z

Something like this already works, you can put SemanticPointer instances into a NumPy array and use these:

A = np.array([spa.SemanticPointer(...), spa.SemanticPointer(...), ...])
B = np.array([spa.SemanticPointer(...), spa.SemanticPointer(...), ...])
C = spa.SemanticPointer(...)  # or alternatively np.array([spa.SemanticPointer(...)])

result = A * (B ** 2.5) * C  # note the ** operator is not yet implemented, but should work this way once it is

But I suppose you're also thinking of an optimization to avoid doing the IFFT followed with the FFT again in-between operations? For that I could think of two approaches:

Trying to handle this transparently by adding the capability to store each SemanticPointer in either the standard vector space or FFT space and only do a conversion to the respective other representation only when required by an operation.
Add an API to do operation explicitly in the Fourier space. Different options might be viable, just spitballing:
- ```
with spa.useFourierSpace:
    result = A * (B ** 2.5) * C
```
- result = spa.inFourierSpace("A * (B ** 2.5) * C") (hm, code in strings, not sure I like that 🤔)
- result = (A.fft() * (B.fft() ** 2.5) * C.fft()).ifft()

Similarly batches of "positive" unitary vectors can be generated efficiently by using Euler's formula np.exp(1j * phi) where |phi| < pi and setting the DC/Nyquist coefficients to 1 across the batch axis (no FFT required).

This could be implemented as additional generator in the vector_generation module, I suppose.

arvoelke · 2021-04-11T01:01:59Z

When I was mentioning the speed-up that is actually what I was comparing to. I used to do this in the above style. And it does indeed work functionally, but it's quite slow. It's much faster to batch it in NumPy because it does everything in a constant number of C routines that each do the work in parallel, as opposed to looping and sequentially invoking a Python method that then dispatches separate C routines for each SP. Here's a quick-and-dirty benchmarking example:

import numpy as np
import nengo_spa as spa

rng = np.random.RandomState(seed=0)

n, d = 1024, 256
A = rng.randn(n, d)
B = rng.randn(n, d)

sspA = np.asarray([spa.SemanticPointer(a) for a in A])
sspB = np.asarray([spa.SemanticPointer(b) for b in B])

%timeit sspC = sspA * sspB
%timeit C = np.fft.irfft(np.fft.rfft(A) * np.fft.rfft(B))

assert np.allclose(C, [c.v for c in sspC])

22.4 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.21 ms ± 999 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

So in this case it's ~10x faster to batch it in NumPy. I'm using a Conda install of numpy==1.18.5 and an AMD Ryzen 9 3900X 12-core processor.

jgosmann added the feature request label Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch binding #280

Batch binding #280

jgosmann commented Mar 16, 2021

jgosmann commented Mar 16, 2021

arvoelke commented Mar 17, 2021 •

edited

Loading

jgosmann commented Apr 10, 2021

arvoelke commented Apr 11, 2021 •

edited

Loading

Batch binding #280

Batch binding #280

Comments

jgosmann commented Mar 16, 2021

jgosmann commented Mar 16, 2021

arvoelke commented Mar 17, 2021 • edited Loading

jgosmann commented Apr 10, 2021

arvoelke commented Apr 11, 2021 • edited Loading

arvoelke commented Mar 17, 2021 •

edited

Loading

arvoelke commented Apr 11, 2021 •

edited

Loading