You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the following example, the stateupdater doesn't achieve full occupancy on my laptop GPU (MX150). Why? Is this a GPU ressource limitation or is there something going wrong in the occupancy calculation?
frombrian2import*importbrian2cuda# These two lines sufficeset_device('cuda_standalone') # to run brian2 on a GPU# ParametersN=5000 ; duration=0.1*second ; V_r=10*mVtheta=20*mV ; tau=20*ms ; delta=2*mstau_ref=2*ms ; C=1000 ; J=0.1*mVmu_ext=25*mV ; sigma_ext=1*mV# Network of N noise-driven leaky integrate-and-fire neuronsmodel="""dV/dt = (-V + mu_ext) / tau + sigma_ext / sqrt(tau) * xi : volt"""neurons=NeuronGroup(N,
model,
threshold='V>theta',
reset='V=V_r',
refractory=tau_ref,
method='euler')
# Initialize membrane potentialneurons.V=V_rrun(duration)
This gives
INFO kernel_neurongroup_stateupdater_codeobject
7 blocks
768 threads
36 registers per block
0 bytes statically-allocated shared memory per block
0 bytes local memory per thread
576 bytes user-allocated constant memory
0.750 theoretical occupancy (need 6 blocks for 1.000)
INFO kernel_neurongroup_thresholder_codeobject
5 blocks
1024 threads
16 registers per block
0 bytes statically-allocated shared memory per block
0 bytes local memory per thread
576 bytes user-allocated constant memory
1.000 theoretical occupancy (need 6 blocks for 1.000)
INFO kernel_neurongroup_resetter_codeobject
5 blocks
1024 threads
14 registers per block
0 bytes statically-allocated shared memory per block
0 bytes local memory per thread
576 bytes user-allocated constant memory
1.000 theoretical occupancy (need 6 blocks for 1.000)
Why do we use 7 blocks for the stateupdater? How do we get 100% occupancy with only 5 blocks for the the thresholder and resetter if the occupancy calculation says that we need 6 blocks?
To get the (need 6 blocks for 1.000), I printed the min_num_threads variables (which should be called min_num_blocks...).
The text was updated successfully, but these errors were encountered:
See my explanations in #266. We use 36 registers, that means we can't run 2048 threads per block due to registers per SM limits (would need 32 registers per thread for that). Hence we use less threads than 1024, leading to lower theoretical occupancy.
The occupancy value is a theoretical occupancy per SM, so it is 100% independent of number of blocks. But to actually fully use all SMs, one would need 6 blocks here (since there are 3 SMs that can run 2 blocks each on the MX150).
TODO: Modify the info message to say "theoretical occupancy per SM", to make this distinction clearer.
For the following example, the stateupdater doesn't achieve full occupancy on my laptop GPU (MX150). Why? Is this a GPU ressource limitation or is there something going wrong in the occupancy calculation?
This gives
Why do we use
7 blocks
for the stateupdater? How do we get100% occupancy
with only5 blocks
for the the thresholder and resetter if the occupancy calculation says that we need6 blocks
?To get the
(need 6 blocks for 1.000)
, I printed themin_num_threads
variables (which should be calledmin_num_blocks
...).The text was updated successfully, but these errors were encountered: