Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize StateMonitor #278

Open
denisalevi opened this issue Mar 31, 2022 · 0 comments
Open

Optimize StateMonitor #278

denisalevi opened this issue Mar 31, 2022 · 0 comments

Comments

@denisalevi
Copy link
Member

There are a few straight forward optimizations for our current StateMonitor implementation:

  1. We are currently using dynamic device vectors (thrust) for monitor data. And we use one vector per recorded neuron. We don't need that at all, we know how much data each monitor variable will need from its clock and the number of recorded neurons. So instead of dynamic vectors and resizing, set the size once in the beginning.
  2. Currently, monitor data is stored on the GPU and only copied to CPU at the end of a simulation. We should implement GPU -> CPU copies at user-defined (or heuristic) intervals. I think a global (or per monitor) preference that sets a fix amount of GPU memory for the monitor would be good. Whenever that GPU memory is full, we copy the data to host. Optionally, the data could then also be written directly to disc. This would allow recording a lot of data even with little RAM.
  3. Transpose the 2D monitor arrays in GPU memory, such that writing the state variables to the monitor for all recorded neurons in a single time step is coalesced. This would also require modifying the loop in object.cu that writes the data to disc, such that the written format is unchanged (for Brian to read it correctly). This basically needs another transpose I guess.

For 3., here is the corresponding comment from From #201 and #50:

And the global memory writes are not coalesced. Currently we have a 2D data structure of dimensions indices x record_times (vector of vectors) for each variable monitor. And we fill that in the kernel like this

monitor[tid][current_iteration] = ...

For coalesced writes we could just "transpose" the monitor data structure so we can use

monitor[current_iteration][tid] = ...

We might have to resort the monitor in the end though since it might then not fit with the format that Brian expects to read back.

Originally posted by @denisalevi in #50 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant