Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up snips by communicating less often #26

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Conversation

tcstewar
Copy link
Collaborator

@tcstewar tcstewar commented Aug 24, 2018

This changes the snip I/O system so that we don't communicate every timestep. This adds a snip_io_steps parameter to Simulator to let you control how often (in units of steps) we should communicate. The reason for this is to drastically speed up the simulation! If this is set to 1 (which is the same as what it used to be), then we're at around 8ms per time step. If it's set to 10 (which is the new default I just created), then we get ~1.5ms per time step! Yay faster loihi!
One big change that was used to implement this is that we are no longer sending spikes from host to chip. Instead, we send decoded values, and those values are then turned into spikes on-chip. Right now, this is done in the snip itself on the x86 processor, just by implementing a bunch of little accumulators. However, in a future PR we could also reorganize this so that we actually use a Loihi core to do this, since simulating spiking neurons is what it's for. In that case, whenever we receive a value from the host, we'd set the bias of a Loihi Cx. This would mean that we wouldn't be doing anything in the snip itself on time steps where we're not communicating.

TODO

  • Check if there are too many inputs (userdata is only 1024 bytes)

tbekolay and others added 10 commits August 19, 2018 16:30
The overall structure of the documentation is copied from other
Nengo projects. Much of the content is adapted from the docs
currently in the wiki, with slight changes to make it more generic
to any Loihi setup and not necessarily ours. The overview
and API docs are new. Some classes that made sense to include in
the API docs had sparse docstrings, so those were added where
appropriate. Examples were moved to the docs folder for easier
inclusion and fewer folders in the root directory.
The input stream has shape `(n_steps, input_dimensions)`, so before we were always running for `input_dimensions` (390) timesteps.
And add a docstring for loihi_api.VthProfile.
For both the emulator and hardware. Note that these are set to
match for the learn_communication_channel notebook, and won't
necessarily match in other places, but since we render this
notebook in the docs it's important that it matches Nengo here.
This was referenced Aug 24, 2018
@hunse hunse self-requested a review September 4, 2018 20:35
@hunse hunse self-assigned this Sep 4, 2018
Copy link
Collaborator

@hunse hunse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this looks good. I've just made a few comments asking for comments/explanation at certain points in the code. I've also added a few comments myself in additional commits. (Be sure to pop out the "show outdated" comments. I went through commit-by-commit, which is why they've appeared like that, but I think they're all still applicable.)

for inp, cx_ids in core.iterate_inputs():
axon_ids = inp.axon_ids[0]
assert len(axon_ids) % 2 == 0
n_inputs += len(axon_ids) //2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why divide by two?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because we have both positive and negative channels? We should note that here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and good idea -- I've added a comment

axon_ids = inp.axon_ids[0]
half = len(axon_ids)//2
for i in range(len(axon_ids)//2):
assert axon_ids[i][0] == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the chip ID, right? Maybe put a "chip_id must be zero" message with the assertion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added :)

assert axon_ids[i][0] == 0
assert axon_ids[i][1] == axon_ids[half+i][1]
spike_targets.extend((axon_ids[i][1], axon_ids[i][2],
axon_ids[half+i][2]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment to explain what each of these values are would be helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good idea. That's pretty obscure what it's doing... comment added.

readChannel(inChannel, spike, 3);
s->userData[IDX_SPIKE_TARGETS+(i*3)+0] = spike[0];
s->userData[IDX_SPIKE_TARGETS+(i*3)+1] = spike[1];
s->userData[IDX_SPIKE_TARGETS+(i*3)+2] = spike[2];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd have comments at the end of each line saying what that value is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (int i=0; i < N_INPUTS; i++) {
readChannel(inChannel, spike, 1);
//printf("stim value %d.%d\n", i, spike[0]);
value[i] = spike[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a binary value indicating whether there's a spike in that channel at the current time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code below, I now believe it to be a discretized version of the real value that we're communicating (i.e. the decoded value). A note about this would be helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

encoders=np.vstack([np.eye(dim), -np.eye(dim)]),
max_rates=[max_rate] * dim + [max_rate] * dim,
intercepts=[-1] * dim + [-1] * dim)
# TODO: move the above check elsewhere
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I wrote this check, but why can't the collective rate be > 1000?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, this looks weird now since you can't actually see the check (unless you go commit-by-commit). The check was that max_rate <= 1000, where max_rate = inter_rate * inter_n.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just kept that check from before, and it doesn't seem right to me. It also seems like something that should be looked into in #55 . I've added a TODO here and made a note about it in that PR

readChannel(inChannel, spike, 1);
//printf("stim value %d.%d\n", i, spike[0]);
value[i] = spike[0];
if (s->time % IO_STEPS == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better for this to happen when s->time % IO_STEPS == 1? I believe that time starts at 1. That way, we'll read a value right after initialization, and won't waste the first few time steps with no input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- I've moved the check. :)

{% endfor %}

writeChannel(outChannel, output, N_OUTPUTS);
writeChannel(outChannel, output, N_OUTPUTS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is N_OUTPUTS sometimes zero? If so, is there any overhead associated with this? Do we want a check so that we only call this if we have outputs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a use case for N_OUTPUTS==0, but it is supported, so I've added an if statement. :)

import nxsdk
target = 'loihi'
except ImportError:
target = 'sim'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to what's being done here, but it might be nice to have some sort of output (print statements? logger statements?) that indicate which target has been chosen. I would probably do it even in the case that the user has explicitly specified target (so after this if block).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, although I'm not sure what the best sort of message would be. I'd definitely leave it for a separate PR, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure we already have these logging statements, they're just not raised here.

@hunse hunse assigned tcstewar and unassigned hunse Sep 5, 2018
Terry Stewart added 2 commits September 10, 2018 11:57
This is to avoid surprising users who expect the default
nengo behaviour of messages passing all the time
@tcstewar
Copy link
Collaborator Author

Thank you for the comments, @hunse ! I've added a bunch of comments based on your suggestions.

I also made one change: I changed the default for snip_io_skips to 1, so that it defaults to exactly the same behaviour that we are used to, and users don't have to understand anything. This means that, in order to get any speedup for their snips, they have to manually set this parameter. I think this is less surprising than defaulting to 10 and having people be surprised that their inputs change at 10ms intervals.

@tcstewar tcstewar removed their assignment Sep 10, 2018
// Note that we do this at the *start* of an IO_STEP period (since
// s->time starts at 1), so that we don't have a period at the beginning
// of the simulation where we are ignoring the input.
if (s->time % IO_STEPS == 1 % IO_STEPS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as (IO_STEPS==1) || (s->time % IO_STEPS == 1)? Currently we do it one way here and another way down below. I'd keep it consistent. Also, as elegant as this way is, I think I prefer the explicit OR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't even notice I'd solved that puzzle before.... :) Yes, those are meant to be the same, and I agree that the explicit OR is clearer than the elegant mod-comparison thing. :) I've switched it to the explicit approach.

@hunse
Copy link
Collaborator

hunse commented Sep 10, 2018

It all looks good, except for the one little inline comment I added.

I also think defaulting the number of steps to 1 is a good idea.

target = 'loihi'
except ImportError:
target = 'sim'

if target == 'simreal':
logger.info("Using real-valued simulator")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. here

Copy link
Member

@tbekolay tbekolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked through briefly, and I have a few issues before I delve in in more detail.

  1. I'm not sure about the overhead of it, but it feels like collecting timing information (tracking self.time_per_step) shouldn't always be enabled, but be something you can turn on and off.
  2. spiking_interneurons_on_host, on the other hand, feels like a weird thing to be able to turn on and off. Are there some situations where this is better and some where it is worse? Or have we just not figured out which is more efficient yet?
  3. This definitely needs a test. Since the default doesn't change current behavior, running the existing test suite isn't sufficient.

@tcstewar
Copy link
Collaborator Author

I've added a few tests -- let me know what you think @tbekolay !

Also, while writing these tests, I found issue #63 . This PR's performance is currently still affected by that bug. Bizarrely enough, I had accidentally included a workaround in this PR (in that I had specified the synapse twice, and since issue #63 was ignoring one of the synapses, that meant it ended up with the correct behaviour, but for the wrong reasons). But, for clarity, I have removed that second synapse specification from this PR, so that it has consistent behaviour.

@tcstewar tcstewar assigned tbekolay and unassigned tcstewar Sep 12, 2018
Copy link
Member

@tbekolay tbekolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some inline comments from a more detailed read through.

The new tests look good for testing the spiking_interneurons_on_host part of the splitter, but there are still no tests that use a different value for the snip_io_steps argument to the simulator, which is the main point of this PR. It is still not clear to me whether the behavior of the network should be unchanged (i.e., setting snip_io_steps > 1 should yield the same results in sim.data, which would be an easy thing to write tests for) or if it will lag behind (in which case, the test should make it clear how it lags behind -- i.e., if you have a node with output t do you get, essentially, t[::snip_io_steps]?)

if (inChannel == -1 || outChannel == -1) {
printf("Got an invalid channel ID\n");
return;
}

if (s->time % 100 == 0) {
if (s->time == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to be doing initialization stuff in the mgmt phase. Can this be done during the initProcess phase instead? This is what is done in the NxSDK snips.py example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time I tried that (which was a few nxsdk releases ago), I ran into some troubles getting that working. I haven't tried it recently, though, but I'd be tempted to leave that for a separate PR.

printf("time %d\n", s->time);
}

readChannel(inChannel, count, 1);
//readChannel(inChannel, count, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these commented out lines still relevant or can we delete them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely deleteable. I'll do that.

//printf("%d value:%d accum:%d\n");
pos_accum[i] += (value[i] + (1<<15));
if (pos_accum[i] >= (1<<16)) {
uint8_t core = s->userData[IDX_SPIKE_TARGETS+(i*3)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move all variable declarations to the beginning? I think it's a pretty old school C thing to do, but it's what I'm used to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Sounds fair. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's back from the days before compilers were very good at optimizing things. Personally, I think readability is the most important factor. It can also help with debugging to have your variables only declared in the scope you want to use them in, so you don't accidentally use them out of scope (one of the reasons old-school C programs were not fun to debug).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've not done enough C stuff in a professional setting to know what the standards are nowadays, but perhaps we could adopt PEP-7 since we're using PEP-8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP-7 sounds good to me.... that's pretty close to what we've already been doing, too, other than cuddled else statements and the brace that starts a function....

@@ -0,0 +1,40 @@
import nengo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file supposed to be in here or included accidentally?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry -- that was an accident.

@tbekolay tbekolay assigned tcstewar and unassigned tbekolay Sep 25, 2018
@tcstewar
Copy link
Collaborator Author

but there are still no tests that use a different value for the snip_io_steps argument to the simulator, which is the main point of this PR.

Ah, good point. I'll see what I can put together.

It is still not clear to me whether the behavior of the network should be unchanged (i.e., setting snip_io_steps > 1 should yield the same results in sim.data, which would be an easy thing to write tests for) or if it will lag behind (in which case, the test should make it clear how it lags behind -- i.e., if you have a node with output t do you get, essentially, t[::snip_io_steps]?)

It won't be the same data -- it'll be more like t[::snip_io_steps]. Which gives me a good idea for a test -- I'll try a Node with lambda t: 1 if timestep%snip_io_steps==0 else 0..... :)

@tbekolay
Copy link
Member

lambda t: 1 if timestep%snip_io_steps==0 else 0

What is that supposed to output? I'm sure you know, but I can't quite guess by looking at that function. With something like lambda t: t I can at least look at the plot and see the stair-steps (I assume). I think the part that makes it clear is the statement of what the output should be; e.g., for lambda t: t, my understanding is that it should be np.repeat(t[::snip_io_steps], snip_io_steps), so I'm keen to know what it is if it isn't that.

@tcstewar
Copy link
Collaborator Author

I think the part that makes it clear is the statement of what the output should be; e.g., for lambda t: t, my understanding is that it should be np.repeat(t[::snip_io_steps], snip_io_steps), so I'm keen to know what it is if it isn't that.

Yes, that's right. So for the lambda I posted, that should give the same result as a nengo.Node(1) (i.e. all the 0 values are being skipped over). It's a pretty pathological case, but should make for a very clear test.

@tcstewar
Copy link
Collaborator Author

I can at least look at the plot and see the stair-steps (I assume).

Yes, that'd be stair-steps -- but I just didn't want to write code in the test to check for stair-steps. :)

@tbekolay
Copy link
Member

tbekolay commented Sep 25, 2018

So for the lambda I posted, that should give the same result as a nengo.Node(1) (i.e. all the 0 values are being skipped over).

Right yeah, I think that's what's a bit concerning to me, that there are other bugs that I can see resulting in a node output of all ones.

I just didn't want to write code in the test to check for stair-steps. :)

I think it's exactly np.repeat(t[::snip_io_steps], snip_io_steps), possibly t[1::snip_io_steps]

@tcstewar
Copy link
Collaborator Author

tcstewar commented Sep 25, 2018

I think it's exactly np.repeat(t[::snip_io_steps], snip_io_steps)

But that value will be fed into an on-chip spike-generator, then into neurons, then decoded, and it'd be hard to algorithmically look at the output from that decode and see if it's doing a stair-case. Although I suppose we could have the test run twice (once with snip_io_steps=N, and once with snip_io_steps=1 but manually changing the input node to do the repeat thing).... hmmm... I'll try that too....

@tcstewar
Copy link
Collaborator Author

Hmm, a complication I'm having with the ramp test is that snip_io_steps also affect the output from the chip. So that means we're subsampling the output, and this is currently happening before the synapse on the output, which makes things pretty noisy. Moving that synapse on-chip is not straightforward, and I'd prefer that to be a separate PR. For posterity, here's the test I've been playing with for this:

@pytest.mark.skipif(pytest.config.getoption("--target") != "loihi",
                    reason="snips only exist on loihi")
@pytest.mark.parametrize("snip_io_steps", [10])
def test_snip_skipping_ramp(Simulator, seed, plt, snip_io_steps):
    dt = 0.001

    filt = nengo.synapses.Lowpass(0.02)
    # run a ramp with snip_io_steps
    with nengo.Network(seed=seed) as model1:
        a = nengo.Ensemble(200, 1)
        stim = nengo.Node(lambda t: t)
        nengo.Connection(stim, a, synapse=None)
        p1 = nengo.Probe(a, synapse=filt)

    with Simulator(model1, dt=dt, precompute=False,
                   snip_io_steps=snip_io_steps) as sim1:
        sim1.run(1.0)
    data1 = sim1.data[p1]

    # run the same inputs with snip_io_steps=1
    with nengo.Network(seed=seed) as model2:
        a = nengo.Ensemble(200, 1)
        def stim_func(t):
            step = int(t/dt)
            prev_step = ((step - 1) // snip_io_steps) * snip_io_steps
            if prev_step < 0:
                prev_step = 0
            return prev_step * dt

        stim = nengo.Node(stim_func)
        nengo.Connection(stim, a, synapse=None)
        p2 = nengo.Probe(a, synapse=None)

    with Simulator(model2, dt=dt, precompute=False,
                   snip_io_steps=1) as sim2:
        sim2.run(1.0)
    data2 = filt.filt(sim2.data[p2])

    # convert the output to what would happen if the output
    # were read every snip_io_steps time steps
    data3 = np.repeat(sim2.data[p2][::snip_io_steps], snip_io_steps, axis=0)
    data3 = filt.filt(data3)

    print(np.hstack([data1, data2, data3]))

@tcstewar
Copy link
Collaborator Author

I've added the 0/1 test, which also revealed a bug (on-chip probing was broken)

@tbekolay
Copy link
Member

on-chip probing was broken

If this is because of the TODOs in the ProbeDict, then I'm currently working on that fix -- if not then carry on!

@tcstewar
Copy link
Collaborator Author

If this is because of the TODOs in the ProbeDict, then I'm currently working on that fix -- if not then carry on!

Nope, it was a silly bug on my part forgetting that we need to repeat the data we get from the chip... It's now fixed: 688f8c0

@tbekolay
Copy link
Member

Is this ready for re-review? Please make sure to re-assign the PR to me when you're ready for a re-review @tcstewar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants