Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional learning #89

Merged
merged 1 commit into from
Sep 28, 2018
Merged

Multidimensional learning #89

merged 1 commit into from
Sep 28, 2018

Conversation

tcstewar
Copy link
Collaborator

@tcstewar tcstewar commented Sep 26, 2018

This PR fixes multidimensional learning. It is based off of move-manual-decoders-onchip, but only because that had a nice test that fit for this PR.

All this does is fix the snip code and the communication code such that all the error data gets sent across and applied. It doesn't change the emulator at all, since it was already working in the emulator.

The communication protocol is not wonderful. The message format is [core_id, n_vals, val_0, val_1, ... val_n-1], so the first two values are the same every time a message is sent (this isn't much worse than the old format, which was [core_id, val_0]). Optimizing that will be a different PR, and will interact in interesting ways with #26 . But this is a definite improvement in the meantime.

This is heavily based on, but replaces #22 (the rebasing got too complicated for me so I started this one fresh).

TODO

  • get multiple learning cores working

@tbekolay
Copy link
Member

@tcstewar You removed your assignment, but the PR is still marked as WIP. Is it ready for review? Or collaboration?

@tbekolay
Copy link
Member

tbekolay commented Sep 26, 2018

Another thing TODO:

@tbekolay tbekolay force-pushed the move-manual-decoders-onchip branch 2 times, most recently from 52b9c46 to 0811e2a Compare September 26, 2018 19:24
@tbekolay tbekolay changed the base branch from move-manual-decoders-onchip to master September 26, 2018 19:32
@tcstewar tcstewar self-assigned this Sep 26, 2018
@tcstewar
Copy link
Collaborator Author

Is it ready for review? Or collaboration?

I'll rebase it to master now that the manual decoders are merged in, then take a pass through the #22 comments, and then it should be ready for review. :)

@tcstewar
Copy link
Collaborator Author

The last commit adds a test of having multiple learning connections at once. It seems to mostly work, but doesn't converge to the target values and I'm not sure why. If someone else could take a look at it before the workshop, that'd be great!

@tbekolay
Copy link
Member

So adding some plotting to that test, I get the following in the emulator

test_learning.test_multiple_pes_emu.pdf

and this on hardware

test_learning.test_multiple_pes.pdf

One difference is that they change at different rates. That is likely down to a mismatch in learning rates that we probably need to do a better job of matching. It may also be partly due to the fact that error values are clipped to the [-1, 1] range.

The more salient difference is that after a steady, predictable climb toward the target value, the line starts going haywire. This becomes more apparent when you run the network for longer, or with a high learning rate. The fact that the lines all start moving in a predictable manner tells me that it is not likely a problem with the PES implementation (i.e., the delta is being calculated correctly and is being applied to the right synapses, etc etc). I think the problem is most likely some value over or underflowing its discretized range; this has happened before (and is why we introduced the checks discussed in #88). I do not get any warnings raised in the emulator, so the over/underflowed quantity is not U or V; most likely it's the weights themselves. In the emulator, we currently use float64s to represent the learning trace and while we initially discretize the weights to match hardware, we don't re-discretize every time the weights change during the simulation, so likely we're using the full int32 range. Put another way, the problem is that we discretize weights based on their initial value, but when doing online learning, their final values are going to be very different from their initial values.

In the long term, we should definitely think about ways in which we can mitigate this problem. I'll make an issue to start thinking about that.

In the short term, I'm thinking if that the initial function that the learned connection represents results in larger decoders, then we should do a much better job of discretizing. I'll try doing that now.

@tbekolay
Copy link
Member

Modifying the function (actually in this case just removing the function=lambda x: 0) is a success!

Emu:
test_learning.test_multiple_pes_emu.pdf

Hardware:
test_learning.test_multiple_pes.pdf

Also note that they both converge significantly faster. I suspect this might happen on reference Nengo too because the weights might have less distance to travel because they don't start near zero. But, it may also be the case that the range discretization results in the same weight updates having a larger effect than they would have in reference Nengo because weights are being pushed farther as a result of the same delta.

In any case, I'll add some asserts and push the now working test. I'll also make a separate issue to figure out the best way to inform users about this issue.

@tbekolay
Copy link
Member

OK, pushed 6eab5e3, and also rebased. If you get a chance, let me know what you think @tcstewar, but in any case I'll do a proper review tomorrow so we can get this merged in.

@tcstewar
Copy link
Collaborator Author

OK, pushed 6eab5e3, and also rebased. If you get a chance, let me know what you think @tcstewar, but in any case I'll do a proper review tomorrow so we can get this merged in.

Awesome! I completely didn't think of that weight scaling issue.... makes a lot of sense in hindsight, but I was not thinking in that direction at all..... Very good thing to know for the learning tutorials too. :)

The commit looks good to me. :) Thank you!

@hunse
Copy link
Collaborator

hunse commented Sep 28, 2018

It would be nice if we could replicate that chip learning behaviour in the emulator, before we got rid of function=lambda x: 0. I assume it's some kind of clipping that's going on. Doesn't have to be for this PR, but if not now then let's make an issue.

@tbekolay
Copy link
Member

Yeah I'll make an issue.

Copy link
Member

@tbekolay tbekolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit with mostly style fixes. With that, LGTM!

One thing that I tried was to switch the initial function in test_pes_comm_channel from returning 0 to returning -x, but when I did that all the parametrizations failed with U overflow errors. So, I think the initial function is a super critical thing to play around with in learning networks.

I'll make a few issues out of the things discovered in this PR, then squash all the commits down to one and merge. Will probably take a little while, so if anyone has objections feel free to raise them in the next hour or so!

@tbekolay
Copy link
Member

tbekolay commented Sep 28, 2018

Added comments to #83 and #98 and made #99 and #100 to track things that we should do as a result of the issues raised in this PR. Squashing and merging now!

The actual communication protocal is not as efficient as
it could be, but this work properly.

The choice of the function computed across the learned
connection prior to learning turns out to have a huge
effect on the behavior of the network. If the weights are
initially much smaller than they will be post-learning,
the weights can easily over/underflow. Improving the
weight discretization, however, is left for future work.

This commit also removes some writes that used to be needed
to close the snip successfully, but they appear to be
unnecessary now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants