Performance improvements #69

etuleu · 2022-11-01T20:30:56Z

etuleu
Nov 1, 2022
Maintainer

Just writing down my thoughts so far, in case anyone has related ideas.

Since machines nowadays come with multi core CPUs as well as GPUs the idea here is to try and utilize them in order to gain speed improvements.

The strategy is to first make use of CPU cores because that is a smaller step from what we currently have (all serial), but also forces us to think about organizing code into functions that can be run in parallel and about relations between those functions (what needs to happen before the next thing is OK to run). Utilizing cores in Go is pretty easy once you have pure functions, using goroutines. It literally just takes the keyword "go" in front of the call. The goroutine scheduler is also pretty well though out and tries to reuse as much as possible, so we don't have to worry about that part (at least not yet). Another obvious advantage is that the code is all still in Go so it is more portable than a GPU specific language.

There are 3 main parts:

Neuron level integration - this is easy to parallelize because the function operates at the neuron level only
Inhibition
Synapse level error propagation and updates
Spiking

I am still trying to figure out the order that things need to happen but roughly these are the main parts which can be found in axon/layer.go:

func (ly *Layer) Cycle(ltime *Time) {
    // fully parallel
    ly.AxonLay.GFmInc(ltime)
    // goes over the units in the layer (required for inhibition)
    // find a way to do this work at the same time with spikes
    ly.AxonLay.AvgMaxGe(ltime)
    // ===========================
    // once above is done
    // fully parallel
    ly.AxonLay.InhibFmGeAct(ltime)
    ly.AxonLay.ActFmG(ltime)
    ly.AxonLay.PostAct(ltime)
    ly.AxonLay.CyclePost(ltime)

    // try to combine this with AvgMaxGe
    // try to do the Ca update in SendSpike
    // that allows every other function be neural level parallel
    ly.AxonLay.SendSpike(ltime)
}

Long term the plan is to move to parallelizing using a GPU compute shader. For that to be efficient we will probably need to keep the entire network in GPU memory, otherwise the time spent moving things around would be too high.

etuleu · 2022-11-01T21:03:22Z

etuleu
Nov 1, 2022
Maintainer Author

A neural net is a graph (generic, can have cycles, even self edges).

Synapses are edges, Neurons are vertices.

Like in any graph the number of edges grows O(V^2) roughly, where V is number of vertices.
Of course this depends a lot on the architecture of the net, but it is still useful as a measure.

Most of the time is spent in FmCa, because that depends on the number of synapses (edges).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #69

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Performance improvements #69

etuleu Nov 1, 2022 Maintainer

Replies: 1 comment

etuleu Nov 1, 2022 Maintainer Author

etuleu
Nov 1, 2022
Maintainer

etuleu
Nov 1, 2022
Maintainer Author