Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated GPGPU code #96

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

LouisCastricato
Copy link

I'll push the code here as I continue to develop it this summer. LIF's code was updated and seems to be working flawlessly. Performance increases are still similar to #92 (Eg: 5% - 10%). If you have an OpenCL compatible GPU, please run this code and comment below with the percent increase/decrease for the function "test_lif_speed" for 1000 iterations.

Thanks
Louis

…st time; however, the performance difference from removing branching will only truly become apparent after DP has been implemented.
@jgosmann
Copy link
Contributor

jgosmann commented May 4, 2016

What are the exact commands that I have to run?

@LouisCastricato
Copy link
Author

LouisCastricato commented May 4, 2016

Oh the script didn't stage, and I need to learn py.test. I'm too used to gmock. Give me a second

@LouisCastricato
Copy link
Author

Please go into the main directory and run

bench.sh

Thanks

@jgosmann
Copy link
Contributor

jgosmann commented May 4, 2016

plan 0: blockify = False, dur = 2.155
plan 1: blockify = True, dur = 0.818
Original LIF impl
plan 0: blockify = False, dur = 2.153
plan 1: blockify = True, dur = 0.830

(GeForce GTX 980)

@LouisCastricato
Copy link
Author

LouisCastricato commented May 4, 2016

Awesome, thanks! Later on, since the 980 uses compute model 5.1, I may ask you to benchmark a few reduce functions.

@LouisCastricato
Copy link
Author

I can't squeeze much more performance out of the reduction kernel without raising the requirements to OpenCL 2.0; so I will leave it for the time being if we decide to pursue that route.

@LouisCastricato
Copy link
Author

Ok so I'm going to try to work on splitting up the workload between gpus. Not full multi gpu support. Highly selective right now since it would break a lot if I tried to do everything. After that (maybe next week) I want to add a low requirement mode for ARM since nengo is using some features that not all ARM processors support.

Finally I'll be working on multitheading memory transfers and task issuing sometime after adding arm support but this is a huge job and I may need design documentation to complete it. Said being if I can't do this on my own I may ask a friend of mine who is an expert on deferred processing to assist me.

@LouisCastricato
Copy link
Author

Most of gemv is actually very well written (good job lol) and can't be improved much in the current feature set (currently on 1.2, I'd need to increase it to 2.1) without making the code unmaintanable and unreadable. However increasing out tool set to 2.1 would create huge performance increases so I do want to consider this eventually.

@LouisCastricato
Copy link
Author

The next version of opencl has a mode for scaling graphs with no cpu to gpu communication needed. It isn't out yet (I think it comes out mid June) but I'll be buying a GTX 1080 anyway so I may be able to play around with it a bit. It may increase performance. (Only the new pascal and AMD card will support it)

@LouisCastricato
Copy link
Author

Just wanted to post an update. I didn't stop working on this. Sadly my pc died about 25 days ago but I should have it up and running soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants