Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function call and floop simultaneously #69

Open
CharlesRSmith44 opened this issue May 5, 2021 · 0 comments
Open

Function call and floop simultaneously #69

CharlesRSmith44 opened this issue May 5, 2021 · 0 comments

Comments

@CharlesRSmith44
Copy link

I attached an example below. The example works great, but if I want to modify x in the function call, then the function doesn't work. There are real speed gains from combining floop and CUDAEx() compared to other options and I want to be able to exploit them but also modify x within the function. Is that possible?

### Packages
using CUDA, FLoops, BenchmarkTools, FoldsCUDA

### User Inputs
nvec=1000000
M= 50
x = CuArray(rand(Float32, (M, nvec)))

### Function Set up
function parallel_multi(f, x)
   @floop CUDAEx() for i in 1:size(x, 2)
        val = reduce(*,@view(x[:,i])) #works
        #val = reduce(*, @view(x[:,i].^2)) #doesn't work
     #val = reduce(*, x[:,i].^2) #doesn't work
        f[i] = val 
    end
    return f
end

result = CUDA.ones(Float32, (size(x,2),1))

### Comparing speeds
display(@benchmark parallel_multi(result, $x))
display(@benchmark reduce(*, $x, dims = 1))
display(@benchmark prod($x, dims=1)) #identical to above 

'''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant