Host snips #260

hunse · 2019-11-15T00:07:21Z

This is a partial version of #256, that just does the host snips and some more basic speed improvements (the other PR had some more experimental speed improvements, too, which need more testing).

It's been rebased onto #255 and cleaned up.

nengo_loihi/simulator.py

nengo_loihi/hardware/snips/nengo_host.cc.template

nengo_loihi/hardware/interface.py

nengo_loihi/simulator.py

nengo_loihi/hardware/snips/nengo_host.cc.template

nengo_loihi/hardware/interface.py

kinjalpatel27

I have added the preliminary comments. It contains a few minor suggestions, all looks good to me.

nengo_loihi/splitter.py

nengo_loihi/tests/test_conv.py

nengo_loihi/builder/inputs.py

nengo_loihi/decode_neurons.py

nengo_loihi/hardware/interface.py

nengo_loihi/splitter.py

nengo_loihi/passthrough.py

nengo_loihi/inputs.py

hunse · 2020-01-16T18:16:20Z

I looked through the new commits. Just the one question, otherwise they all look good.

nengo_loihi/splitter.py

tbekolay

A version of this PR with a simplified history can be found in the host-snips-fixedup branch. When everyone's had a chance to look over the detailed commits here, I'll replace this branch with that one as with the new changes and fixed up history, it all looks good to me.

These are an artifact of adopting Black with a line length of 88 characters in a codebase that previously had a line length of 79 characters.

This is useful for testing Snips.

This commit does a pretty significant refactor of the splitter code. Specifically, the various stages of splitting have been broken up into independent steps: 1. Host/Chip splitting (now done with the HostChipSplit class) 2. Passthrough splitting (done with the existing PassthroughSplit class) 3. Precomputable splitting (now done with the PrecomputableSplit class) With these broken out, the `Split` class is a simple one that delegates to these three separate processes. While they can be seen as independent, they do depend on one another; specifically, passthrough splitting needs to know about the host/chip split, and precomputable splitting needs to know about both of the previous steps. But since they are separate objects, those instances can be provided to those steps easily. In doing that refactoring, the following other changes were made: - We always compute the PrecomputableSplit, even if the user has explicitly passed `precompute=False`. This is necessary for raising a warning stating that the network could be precomputed to speed up the model. In order to not raise exceptions if the model is not precomputable, the value of `precompute` provided by the user is used to determine if BuildErrors should be raised. They are not raised when it is `False` or `None`, but are raised on `True`. - Previously, we early-exited the precomputable splitting process if the network had any learning. Now, we do the same if the network has a convolutional connection. Technically we could still precompute if the network's convolutional connection is only on the host, but that seems like a rare case that's not worth thinking about right now. In any case, it allows us to not have to remember to pass `precompute=False` for our test networks that use convolution. - The splitter used to dynamically determine locations for probes, but there wasn't much benefit to doing that, so we now do it ahead of time to simplify the code and (potentially) save some time by caching the result. - The `is_precomputable` method was renamed to `precomputable` - The `precomputable` method can also return whether the model as a whole is precomputable by not passing in an object. Since the logic behind whether a model is precomputable is somewhat complex, we manually track it with a boolean variable. - The HostChipSplit tracks objects on the host and chip, rather than tracking "seen" objects and chip objects, where host objects are the "seen" objects that aren't on the chip. The code is not significantly different tracking host and chip sets instead, and it is much easier to explain. - Several code comments have been promoted to docstrings - The classes in `Splitter` now have full docstrings (though they aren't included in the docs yet... they are somewhat internal so not sure we want to advertise all of that) - Refactored the `test_all_run_steps` function as it was too long to be readable and provide good feedback when a case fails. All of the individual cases were split into separte methods in a test class.

Previously, we accepted an `ignore` parameter that had to be manually constructed. However, the only time this information ever really came into play was when an object was on chip. By passing in a `HostChipSplit` object (which is easy to construct) we get the same information automatically. Additionally, this commit calls `config.add_params` in HostChipSplit, rather than relying on it being called earlier. There is no downside to calling `add_params` twice, so we just call it, removing the need to call it in the Simulator and in several test functions.

This reduces unnecessary communication with the chip

Previously, fixed checking of `neurons_per_dimension` and fixed value for `add_to_container` made `get_ensemble` not particularly useful for users trying to make their own `DecodeNeurons`. Now, these are configurable, and default to the values that users would likely want.

The host Snip runs on the host and facilitates communication with the superhost using sockets. This is faster than using the default RPC interface. We also take care to make sure both the host and chip Snips end properly, by sending a message with a negative spike count. This helps to eliminate board hangs. To allow the host Snip to work with multiple `run` calls, we keep it idling in between `run` calls, waiting for a message. If the board disconnects before a subsequent run call, the negative spike count message will tell the host Snip to stop. This commit also includes several other improvements that either are necessary for implementing the host snip, or are complementary to it. - The channel packet size in the IO Snip has been increased. This improves performance by reducing the number of channel reads. - The logic for generating the correct run_steps method has been moved to a separate class for readability. - The host2chip logic when using snips is now faster due to getting the core outside of the loop in the learn Snip. - We now connect to the board when entering the simulator's context manager. - Snip and non-snip operations have been separated into separate classes to make it more clear what is run in the two cases. - Minor fixes to the style of C/C++ templates using clang-format with `-style="{BasedOnStyle: 'LLVM', IndentWidth: 4, TabWidth: 4}`. Since clang-format must be run on the generated files rather than the Jinja templates, this is difficult to automate, but is worth doing every once in a while. - `nengo_io.c` has been removed from the gitignore, as the template should only ever be rendered to a temporary directory. Co-authored-by: Trevor Bekolay <[email protected]>

Add timers around several blocks of code that we want to profile. For the most complicated case, bidirectional communication with Loihi hardware, we profile the time taken to: - build the model ("build") - connect to the board ("connect") - tell the board to start running ("startup") - run the model ("run") - tell the board to stop running ("shutdown") Co-authored-by: Eric Hunsberger <[email protected]>

Previously, host2chip methods spent significant time calling `receive` on each receiver to load information into a queue in the receiver, and then getting it back out again. We now skip that step, and just do everything in host2chip. Eliminating the `hasattr` call also has a significant effect on speed. Simpler queueing in `HostReceiveNode` avoids `while` loop and helps with speed there. Additionally, the host2chip_senders dictonary previously contained both ChipReceiver instances and PESModulatoryTarget instances, which do not share code. By splitting them into separate dictionaries we can operate on them independently, which is also faster. Co-authored-by: Trevor Bekolay <[email protected]>

hunse mentioned this pull request Nov 15, 2019

Dummy axons work properly on chip #261

Merged

nengo deleted a comment from hunse Nov 15, 2019

hunse force-pushed the host-snips branch 8 times, most recently from aa7cce2 to d193274 Compare November 26, 2019 16:58