Open
Description
Hi,
just copy pasting the example:
# ╔═╡ 0094c9ce-0bf7-40ff-b6df-dcddf94828cd
using DiffEqGPU, OrdinaryDiffEq, StaticArrays, CUDA
# ╔═╡ 0c65ebbf-95b8-44eb-8789-8c03c29442d8
function lorenz(u, p, t)
σ = p[1]
ρ = p[2]
β = p[3]
du1 = σ * (u[2] - u[1])
du2 = u[1] * (ρ - u[3]) - u[2]
du3 = u[1] * u[2] - β * u[3]
return SVector{3}(du1, du2, du3)
end
# ╔═╡ faf01799-9b68-463d-b439-504d78ebb2e6
u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
# ╔═╡ fd96be2f-eab0-4cf9-9680-8af8aa5fc54f
tspan = (0.0f0, 10.0f0)
# ╔═╡ 64bb027c-bc7b-443d-a712-8c2c9bf577ac
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
# ╔═╡ 4b77d5b2-2988-4c2a-ab51-a424a016d85a
prob = ODEProblem{false}(lorenz, u0, tspan, p)
# ╔═╡ f5c3ee38-59b5-4840-9cea-ea2c1a52c512
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
# ╔═╡ 169bb61d-8a59-4ad1-9936-d594ce5eb2de
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
# ╔═╡ 097eeba0-3608-498f-9574-b257b6e9604a
@time sol2 = solve(monteprob, Tsit5(), trajectories = 100_000,
adaptive = false, dt = 0.1f0)
# ╔═╡ 5cccc317-125b-4c23-abcf-09292ab9c7e8
CUDA.@time sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(), trajectories = 100_000,
adaptive = false, dt = 0.1f0)
I don't see an improvement with the GPU version.
# CPU
0.182350 seconds (14.10 M allocations: 1.822 GiB)
# GPU
0.249538 seconds (3.44 M CPU allocations: 382.376 MiB) (3 GPU allocations: 126.038 MiB, 0.01% memmgmt time)
System:
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
Threads: 12 on 24 virtual cores
Environment:
JULIA_REVISE_WORKER_ONLY = 1
NVIDIA GeForce RTX 3060 (GPU 0)
Is there anything wrong or is this expected?
Best,
Felix
Metadata
Metadata
Assignees
Labels
No labels