-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster communication with host/chip #256
base: main
Are you sure you want to change the base?
Conversation
cc37ef6
to
df56f68
Compare
Just a comment that you need to do: |
I've added a number more commits to make things faster for learning specifically. a384ca2 takes advantage of nengo/nengo#1581 to not do checking on the outputs of our internal nodes (from 6e49272 is the one I'm least sure about, since it does add a one time-step delay between host and chip (I think), but with the benefit that host and chip can be running at the same time so things are faster. It might be nice to have a way to turn this on or off. |
Can this be generalized in the same way as https://github.com/ctn-waterloo/nengo_brainstorm/pull/22 in particular by allowing multiple steps of delay to buffer on both the input and/or output sides while things run asynchronously? This would be to generalize cases for both before and after this change. |
It could be generalized (that's what #26 was doing way back in the day), but that's using a different mechanism (we actually buffer things), rather than just having the chip and superhost running at the same time. And even if we buffer, there would still be a performance advantage in running the chip and superhost at the same time. So I think they're independent features. |
709725d
to
2f16524
Compare
Some import statements just imported `scipy` when we needed `scipy.sparse`. Import order differences made this an occasional bug. Fixes #252.
This allows us to do a proper `bones-check` with `black`. The hardware tests are still in 3.5.2 to support NxSDK. This commit also fixes some slight changes by `nengo-bones` 0.6.0 that were missed in the upgrade commit because of the missing `bones-check`.
Not backwards compatible with previous versions.
This is useful for testing SNIPs.
- Add a timer around the `Simulator._run_steps` call, to measure the time taken for all steps. - Connect to the board outside the timing loop, so that this does not count towards the step time. - Add a timer specific to SNIPs, to get the most accurate timing (after we call the board run function, so all setup has happened).
This reduces unnecessary communication with the chip
Previously, fixed checking of `neurons_per_dimension` and fixed value for `add_to_container` made `get_ensemble` not particularly useful for users trying to make their own `DecodeNeurons`. Now, these are configurable, and default to the values that users would likely want.
The host SNIP runs on the host and facilitates communication with the superhost using sockets. This is faster than using the default RPC interface. We also take care to make sure both the host and chip SNIPs end properly, by sending a message with a negative spike count. This helps to eliminate board hangs. To allow the host SNIP to work with multiple `run` calls, we keep it idling in between `run` calls, waiting for a message. If the board disconnects before a subsequent run call, the negative spike count message will tell the host SNIP to stop.
This improves performance by reducing the number of channel reads.
The socket between the superhost and host was dropping data when trying to send larger numbers of spikes. This seems to be solved by getting rid of the step counter on the host snip. Something about sending the number of steps as a separate message at the start threw things off in the socket, I guess.
- Also assert one block per core with learning - Also get core less often (outside loop) in learn SNIP
Previously, `Simulator._collect_receiver_info` spent significant time calling `receive` on each receiver to load information into a queue in the receiver, and then getting it back out again. We now skip that step, and just do everything in right in `_collect_receiver_info`. - Eliminating the `hasattr` call in `_collect_receiver_info` also has a significant effect on speed. - Simpler queueing in `HostReceiveNode` avoids `while` loop and helps with speed there.
We used to do this copy in Nengo, now we don't, so need to copy here.
This allows the Nengo model on the (super)host to be running simultaneously with the chip, reducing time per step but adding in a one step delay between the (super)host model and chip model.
This makes things faster by not requiring us to increment a counter through the node, and also makes sure we get all the data. We also fix the time in this node to use zero-based timesteps rather than one-based timesteps (to conform with core Nengo).
This PR makes a number of changes that speed up superhost <-> chip communication:
TODO: