Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host snips #260

Merged
merged 13 commits into from
Jan 20, 2020
Merged

Host snips #260

merged 13 commits into from
Jan 20, 2020

Conversation

hunse
Copy link
Collaborator

@hunse hunse commented Nov 15, 2019

This is a partial version of #256, that just does the host snips and some more basic speed improvements (the other PR had some more experimental speed improvements, too, which need more testing).

It's been rebased onto #255 and cleaned up.

@nengo nengo deleted a comment from hunse Nov 15, 2019
@hunse hunse force-pushed the host-snips branch 8 times, most recently from aa7cce2 to d193274 Compare November 26, 2019 16:58
Copy link
Contributor

@kinjalpatel27 kinjalpatel27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the preliminary comments. It contains a few minor suggestions, all looks good to me.

nengo_loihi/splitter.py Outdated Show resolved Hide resolved
nengo_loihi/tests/test_conv.py Outdated Show resolved Hide resolved
nengo_loihi/splitter.py Outdated Show resolved Hide resolved
nengo_loihi/inputs.py Outdated Show resolved Hide resolved
@hunse
Copy link
Collaborator Author

hunse commented Jan 16, 2020

I looked through the new commits. Just the one question, otherwise they all look good.

nengo_loihi/splitter.py Outdated Show resolved Hide resolved
nengo_loihi/splitter.py Outdated Show resolved Hide resolved
Copy link
Member

@tbekolay tbekolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A version of this PR with a simplified history can be found in the host-snips-fixedup branch. When everyone's had a chance to look over the detailed commits here, I'll replace this branch with that one as with the new changes and fixed up history, it all looks good to me.

tbekolay and others added 11 commits January 17, 2020 15:22
These are an artifact of adopting Black with a line length of
88 characters in a codebase that previously had a line length
of 79 characters.
This is useful for testing Snips.
This commit does a pretty significant refactor of the splitter
code. Specifically, the various stages of splitting have been broken
up into independent steps:

1. Host/Chip splitting (now done with the HostChipSplit class)
2. Passthrough splitting (done with the existing PassthroughSplit
   class)
3. Precomputable splitting (now done with the PrecomputableSplit
   class)

With these broken out, the `Split` class is a simple one that
delegates to these three separate processes. While they can
be seen as independent, they do depend on one another;
specifically, passthrough splitting needs to know about the
host/chip split, and precomputable splitting needs to know about
both of the previous steps. But since they are separate objects,
those instances can be provided to those steps easily.

In doing that refactoring, the following other changes were made:

- We always compute the PrecomputableSplit, even if the user
  has explicitly passed `precompute=False`. This is necessary
  for raising a warning stating that the network could be
  precomputed to speed up the model. In order to not raise
  exceptions if the model is not precomputable, the value of
  `precompute` provided by the user is used to determine if
  BuildErrors should be raised. They are not raised when it is
  `False` or `None`, but are raised on `True`.
- Previously, we early-exited the precomputable splitting process
  if the network had any learning. Now, we do the same if the
  network has a convolutional connection. Technically we could
  still precompute if the network's convolutional connection
  is only on the host, but that seems like a rare case that's
  not worth thinking about right now. In any case, it allows us
  to not have to remember to pass `precompute=False` for our
  test networks that use convolution.
- The splitter used to dynamically determine locations for probes,
  but there wasn't much benefit to doing that, so we now do
  it ahead of time to simplify the code and (potentially) save
  some time by caching the result.
- The `is_precomputable` method was renamed to `precomputable`
- The `precomputable` method can also return whether the model
  as a whole is precomputable by not passing in an object.
  Since the logic behind whether a model is precomputable is
  somewhat complex, we manually track it with a boolean variable.
- The HostChipSplit tracks objects on the host and chip, rather
  than tracking "seen" objects and chip objects, where host objects
  are the "seen" objects that aren't on the chip. The code is not
  significantly different tracking host and chip sets instead,
  and it is much easier to explain.
- Several code comments have been promoted to docstrings
- The classes in `Splitter` now have full docstrings (though they
  aren't included in the docs yet... they are somewhat internal
  so not sure we want to advertise all of that)
- Refactored the `test_all_run_steps` function as it was too long
  to be readable and provide good feedback when a case fails.
  All of the individual cases were split into separte methods
  in a test class.
Previously, we accepted an `ignore` parameter that had to be
manually constructed. However, the only time this information
ever really came into play was when an object was on chip.
By passing in a `HostChipSplit` object (which is easy to construct)
we get the same information automatically.

Additionally, this commit calls `config.add_params` in HostChipSplit,
rather than relying on it being called earlier. There is no downside
to calling `add_params` twice, so we just call it, removing the need
to call it in the Simulator and in several test functions.
This reduces unnecessary communication with the chip
Previously, fixed checking of `neurons_per_dimension` and fixed
value for `add_to_container` made `get_ensemble` not particularly
useful for users trying to make their own `DecodeNeurons`. Now,
these are configurable, and default to the values that users would
likely want.
The host Snip runs on the host and facilitates communication
with the superhost using sockets. This is faster than using
the default RPC interface.

We also take care to make sure both the host and chip Snips
end properly, by sending a message with a negative spike count.
This helps to eliminate board hangs.

To allow the host Snip to work with multiple `run` calls, we
keep it idling in between `run` calls, waiting for a message.
If the board disconnects before a subsequent run call,
the negative spike count message will tell the host Snip to stop.

This commit also includes several other improvements that either
are necessary for implementing the host snip, or are complementary
to it.

- The channel packet size in the IO Snip has been increased.
  This improves performance by reducing the number of channel reads.
- The logic for generating the correct run_steps method has been
  moved to a separate class for readability.
- The host2chip logic when using snips is now faster due to
  getting the core outside of the loop in the learn Snip.
- We now connect to the board when entering the simulator's context
  manager.
- Snip and non-snip operations have been separated into separate
  classes to make it more clear what is run in the two cases.
- Minor fixes to the style of C/C++ templates using clang-format
  with `-style="{BasedOnStyle: 'LLVM', IndentWidth: 4, TabWidth: 4}`.
  Since clang-format must be run on the generated files rather than
  the Jinja templates, this is difficult to automate, but is worth
  doing every once in a while.
- `nengo_io.c` has been removed from the gitignore, as the template
  should only ever be rendered to a temporary directory.

Co-authored-by: Trevor Bekolay <[email protected]>
Add timers around several blocks of code that we want to profile.
For the most complicated case, bidirectional communication with
Loihi hardware, we profile the time taken to:

- build the model ("build")
- connect to the board ("connect")
- tell the board to start running ("startup")
- run the model ("run")
- tell the board to stop running ("shutdown")

Co-authored-by: Eric Hunsberger <[email protected]>
Previously, host2chip methods spent significant time calling `receive`
on each receiver to load information into a queue in the receiver, and
then getting it back out again. We now skip that step, and just do
everything in host2chip.

Eliminating the `hasattr` call also has a significant effect on speed.

Simpler queueing in `HostReceiveNode` avoids `while` loop and helps
with speed there.

Additionally, the host2chip_senders dictonary previously contained
both ChipReceiver instances and PESModulatoryTarget instances, which
do not share code. By splitting them into separate dictionaries we
can operate on them independently, which is also faster.

Co-authored-by: Trevor Bekolay <[email protected]>
@tbekolay tbekolay merged commit 26e5236 into master Jan 20, 2020
@tbekolay tbekolay deleted the host-snips branch January 20, 2020 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants