Faster communication with host/chip #256

hunse · 2019-10-28T22:10:30Z

This PR makes a number of changes that speed up superhost <-> chip communication:

Use a host snip that communicates to the superhost via a socket (faster than RPC).
Use a larger packet size.
Have chip and superhost running simultaneously (rather than one, then the other). This will add a single timestep delay between superhost and chip.

TODO:

Implement communication between host and learning snip.
Clean up commit history: squash timing commits, squash input channel buffering into larger packet size (the latter makes the former obsolete)
Clean up code (run black on each commit, ensure pylint etc. passing)

hunse · 2019-10-29T16:08:57Z

The larger packet size 825a680 fixes #254.

xchoo · 2019-10-30T20:57:14Z

Just a comment that you need to do:
sudo apt-get install g++-arm-linux-gnueabihf
to get the host snips to compile. Might be something to add to the documentation (it might already be in the nxsdk docs though).

hunse · 2019-11-08T21:56:10Z

I've added a number more commits to make things faster for learning specifically.

a384ca2 takes advantage of nengo/nengo#1581 to not do checking on the outputs of our internal nodes (from builder/inputs.py), since we control the outputs of these nodes and can ensure they're safe.

6e49272 is the one I'm least sure about, since it does add a one time-step delay between host and chip (I think), but with the benefit that host and chip can be running at the same time so things are faster. It might be nice to have a way to turn this on or off.

arvoelke · 2019-11-09T01:21:12Z

since it does add a one time-step delay between host and chip (I think), but with the benefit that host and chip can be running at the same time so things are faster. It might be nice to have a way to turn this on or off.

Can this be generalized in the same way as https://github.com/ctn-waterloo/nengo_brainstorm/pull/22 in particular by allowing multiple steps of delay to buffer on both the input and/or output sides while things run asynchronously? This would be to generalize cases for both before and after this change.

hunse · 2019-11-11T14:33:20Z

It could be generalized (that's what #26 was doing way back in the day), but that's using a different mechanism (we actually buffer things), rather than just having the chip and superhost running at the same time. And even if we buffer, there would still be a performance advantage in running the chip and superhost at the same time. So I think they're independent features.

Pending the NengoDL 3.0 release

Some import statements just imported `scipy` when we needed `scipy.sparse`. Import order differences made this an occasional bug. Fixes #252.

This allows us to do a proper `bones-check` with `black`. The hardware tests are still in 3.5.2 to support NxSDK. This commit also fixes some slight changes by `nengo-bones` 0.6.0 that were missed in the upgrade commit because of the missing `bones-check`.

Not backwards compatible with previous versions.

This is useful for testing SNIPs.

- Add a timer around the `Simulator._run_steps` call, to measure the time taken for all steps. - Connect to the board outside the timing loop, so that this does not count towards the step time. - Add a timer specific to SNIPs, to get the most accurate timing (after we call the board run function, so all setup has happened).

This reduces unnecessary communication with the chip

Previously, fixed checking of `neurons_per_dimension` and fixed value for `add_to_container` made `get_ensemble` not particularly useful for users trying to make their own `DecodeNeurons`. Now, these are configurable, and default to the values that users would likely want.

The host SNIP runs on the host and facilitates communication with the superhost using sockets. This is faster than using the default RPC interface. We also take care to make sure both the host and chip SNIPs end properly, by sending a message with a negative spike count. This helps to eliminate board hangs. To allow the host SNIP to work with multiple `run` calls, we keep it idling in between `run` calls, waiting for a message. If the board disconnects before a subsequent run call, the negative spike count message will tell the host SNIP to stop.

This improves performance by reducing the number of channel reads.

The socket between the superhost and host was dropping data when trying to send larger numbers of spikes. This seems to be solved by getting rid of the step counter on the host snip. Something about sending the number of steps as a separate message at the start threw things off in the socket, I guess.

- Also assert one block per core with learning - Also get core less often (outside loop) in learn SNIP

Previously, `Simulator._collect_receiver_info` spent significant time calling `receive` on each receiver to load information into a queue in the receiver, and then getting it back out again. We now skip that step, and just do everything in right in `_collect_receiver_info`. - Eliminating the `hasattr` call in `_collect_receiver_info` also has a significant effect on speed. - Simpler queueing in `HostReceiveNode` avoids `while` loop and helps with speed there.

We used to do this copy in Nengo, now we don't, so need to copy here.

This allows the Nengo model on the (super)host to be running simultaneously with the chip, reducing time per step but adding in a one step delay between the (super)host model and chip model.

This makes things faster by not requiring us to increment a counter through the node, and also makes sure we get all the data. We also fix the time in this node to use zero-based timesteps rather than one-based timesteps (to conform with core Nengo).

hunse force-pushed the faster-comm branch 5 times, most recently from cc37ef6 to df56f68 Compare October 30, 2019 20:35

hunse force-pushed the faster-comm branch from df56f68 to 6f375e8 Compare November 8, 2019 21:30

hunse force-pushed the faster-comm branch from 6f375e8 to 6f01f9a Compare November 11, 2019 20:52

drasmuss added 5 commits November 12, 2019 15:37

Update for Nengo 3.0

f4f62cd

Update for NengoDL 3.0

92e6258

TMP test against nengo-extras fix-compat branch

a242019

TMP test against nengo-dl learning-phase branch

ba296b4

TMP remove nengo-dl version requirement

fc6c384

Pending the NengoDL 3.0 release

hunse force-pushed the faster-comm branch 2 times, most recently from 709725d to 2f16524 Compare November 13, 2019 14:14

hunse and others added 11 commits November 13, 2019 09:54

Ensure scipy.sparse is properly imported

0bd224b

Some import statements just imported `scipy` when we needed `scipy.sparse`. Import order differences made this an occasional bug. Fixes #252.

Support NxSDK 0.8.7

d4e97d1

Not backwards compatible with previous versions.

Support NxSDK 0.9.0

630761c

Allow user to force precompute=False

b7ca05f

This is useful for testing SNIPs.

Only create global spike generator if required

045c9d5

This reduces unnecessary communication with the chip

Larger channel packet size

a17bfa2

This improves performance by reducing the number of channel reads.

hunse added 11 commits November 13, 2019 10:16

Reduce extra computation in host2chip_snips

edf77d6

- Also assert one block per core with learning - Also get core less often (outside loop) in learn SNIP

Faster spike packing

a20d2c2

Copy HostSendNode x value to work with new Nengo

22983ee

We used to do this copy in Nengo, now we don't, so need to copy here.

Flip host2chip and host step order

e7ce444

This allows the Nengo model on the (super)host to be running simultaneously with the chip, reducing time per step but adding in a one step delay between the (super)host model and chip model.

Turn off output check on helper nodes if possible

5fcb6d3

fixup! Turn off output check on helper nodes if possible

aff58e1

fixup! Turn off output check on helper nodes if possible

455d095

fixup! Use proper queue in HostReceiveNode for speed

376f4df

fixup! Use proper queue in HostReceiveNode for speed

a5a06f1

hunse force-pushed the faster-comm branch from f3040cf to a5a06f1 Compare November 13, 2019 15:19

hunse mentioned this pull request Nov 15, 2019

Host snips #260

Merged

hunse mentioned this pull request Mar 18, 2020

Invalid axon type bug #278

Draft

tbekolay marked this pull request as draft December 13, 2021 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster communication with host/chip #256

Faster communication with host/chip #256

hunse commented Oct 28, 2019 •

edited

Loading

hunse commented Oct 29, 2019

xchoo commented Oct 30, 2019

hunse commented Nov 8, 2019

arvoelke commented Nov 9, 2019 •

edited

Loading

hunse commented Nov 11, 2019

Faster communication with host/chip #256

Are you sure you want to change the base?

Faster communication with host/chip #256

Conversation

hunse commented Oct 28, 2019 • edited Loading

hunse commented Oct 29, 2019

xchoo commented Oct 30, 2019

hunse commented Nov 8, 2019

arvoelke commented Nov 9, 2019 • edited Loading

hunse commented Nov 11, 2019

hunse commented Oct 28, 2019 •

edited

Loading

arvoelke commented Nov 9, 2019 •

edited

Loading