-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static GPU compilation of Jacobian #129
Conversation
avik-pal
commented
Feb 15, 2024
•
edited
Loading
edited
aaff1ef
to
98f01c4
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #129 +/- ##
==========================================
+ Coverage 90.22% 90.26% +0.04%
==========================================
Files 20 20
Lines 1217 1223 +6
==========================================
+ Hits 1098 1104 +6
Misses 119 119 ☔ View full report in Codecov by Sentry. |
98f01c4
to
aa1ed0c
Compare
using KernelAbstractions, CUDA, AMDGPU, NonlinearSolve, StaticArrays
@kernel function parallel_nonlinearsolve_kernel!(result, @Const(prob), @Const(alg))
i = @index(Global)
prob_i = remake(prob; u0 = prob.u0[i])
sol = solve(prob_i, alg)
@inbounds result[i] = sol.u
return nothing
end
function vectorized_solve(prob, alg; backend = CPU())
result = KernelAbstractions.allocate(backend, eltype(prob.u0), length(prob.u0))
groupsize = min(length(prob.u0), 1024)
kernel! = parallel_nonlinearsolve_kernel!(backend, groupsize, length(prob.u0))
kernel!(result, prob, alg)
KernelAbstractions.synchronize(backend)
return result
end
@generated function generalized_rosenbrock(x::SVector{N}, p) where {N}
vals = ntuple(gensym ∘ string, N)
expr = []
push!(expr, :($(vals[1]) = oneunit(x[1]) - x[1]))
for i in 2:N
push!(expr, :($(vals[i]) = 10.0 * (x[$i] - x[$i - 1] * x[$i - 1])))
end
push!(expr, :(@SVector [$(vals...)]))
return Expr(:block, expr...)
end
u0 = @SVector [@SVector(rand(10)) for _ in 1:1024]
prob = NonlinearProblem(generalized_rosenbrock, u0)
vectorized_solve(prob, SimpleNewtonRaphson(); backend = CPU())
vectorized_solve(prob, SimpleNewtonRaphson(); backend = ROCBackend())
vectorized_solve(prob, SimpleNewtonRaphson(); backend = CUDABackend()) The |
@utkarsh530 can you take a quick look and tell me why the |
aa1ed0c
to
5d83e64
Compare
5d83e64
to
f2149a9
Compare
we can get the example working later but this is good to go. |