Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GPUArrays allocations cache #2593

Merged
merged 9 commits into from
Jan 9, 2025

Conversation

pxl-th
Copy link
Member

@pxl-th pxl-th commented Dec 15, 2024

@maleadt maleadt added cuda array Stuff about CuArray. performance How fast can we go? labels Dec 16, 2024
@maleadt maleadt force-pushed the master branch 15 times, most recently from 5d585c4 to c850163 Compare December 20, 2024 08:18
src/array.jl Outdated Show resolved Hide resolved
@maleadt maleadt force-pushed the pxl-th/cache-alloc branch from 90ac2fa to 56d9fcc Compare January 9, 2025 12:10
src/array.jl Outdated Show resolved Hide resolved
@pxl-th pxl-th changed the title Use GPUArrays caching allocator Support GPUArrays allocations cache Jan 9, 2025
@pxl-th pxl-th marked this pull request as ready for review January 9, 2025 12:35
src/array.jl Show resolved Hide resolved
Copy link

codecov bot commented Jan 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.04%. Comparing base (234ec4c) to head (c4c3639).
Report is 3 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2593       +/-   ##
===========================================
- Coverage   73.64%   61.04%   -12.60%     
===========================================
  Files         157      157               
  Lines       15220    14955      -265     
===========================================
- Hits        11209     9130     -2079     
- Misses       4011     5825     +1814     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt
Copy link
Member

maleadt commented Jan 9, 2025

It's also kinda weird that we're always attaching a finalizer to the array, even when it's put inside of cache (meaning it should get freed when the cache goes out of scope). Should we move the finalizer into cached_alloc? That starts to make it a slightly messy API, though.

@pxl-th
Copy link
Member Author

pxl-th commented Jan 9, 2025

It's also kinda weird that we're always attaching a finalizer to the array, even when it's put inside of cache (meaning it should get freed when the cache goes out of scope). Should we move the finalizer into cached_alloc? That starts to make it a slightly messy API, though.

But we need them anyway, so it might be fine to attach them in ctor and have a single place for this, rather than catching when they go out of cache.

@maleadt maleadt merged commit 1389800 into JuliaGPU:master Jan 9, 2025
3 of 4 checks passed
@pxl-th pxl-th deleted the pxl-th/cache-alloc branch January 9, 2025 15:47
@pxl-th
Copy link
Member Author

pxl-th commented Jan 9, 2025

Passing bytesize as a key results in errors, where arrays are of the same bytesize, but of different dims.

julia> GPUArrays.@cached cache ROCArray(zeros(Float32, 16));

julia> GPUArrays.@cached cache ROCArray(zeros(Float32, 4, 4))
ERROR: TypeError: in typeassert, expected ROCArray{Float32, 2, AMDGPU.Runtime.Mem.HIPBuffer}, got a value of type ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}
Stacktrace:
 [1] ROCArray
   @ ~/.julia/dev/AMDGPU/src/array.jl:9 [inlined]
 [2] ROCArray
   @ ~/.julia/dev/AMDGPU/src/array.jl:111 [inlined]
 [3] ROCArray
   @ ~/.julia/dev/AMDGPU/src/array.jl:116 [inlined]
 [4] ROCArray(A::Matrix{Float32})
   @ AMDGPU ~/.julia/dev/AMDGPU/src/array.jl:119
 [5] macro expansion
   @ ~/.julia/packages/GPUArrays/2FKRj/src/host/alloc_cache.jl:147 [inlined]
 [6] top-level scope
   @ REPL[4]:1

To fix this we can either revert to dims or reshape an array post GPUArrays.cached_alloc and only then do type assertion:

return if size(x) != dims
    reshape(x, dims)
else
    x
end::ROCArray{T, N, B}

@maleadt
Copy link
Member

maleadt commented Jan 10, 2025

julia> GPUArrays.@cached cache ROCArray(zeros(Float32, 16));

julia> GPUArrays.@cached cache ROCArray(zeros(Float32, 4, 4))
ERROR: TypeError: in typeassert, expected ROCArray{Float32, 2, AMDGPU.Runtime.Mem.HIPBuffer}, got a value of type ROCArray{Float32, 1, AMDGPU.Runtime.Mem.HIPBuffer}

Yes, that's what I meant in #2593 (comment), and I have reverted the change before merging this PR already.

@maleadt
Copy link
Member

maleadt commented Jan 10, 2025

That said, it would be nice if the cache could fulfill such requests. I considered storing the DataRef instead, which tracks a refcount (so we could check if it has been unsafe_freed!d). However, that does make finalization more tricky: Since the cache doesn't keep the array objects alive anymore, that means array finalizers can now affect the refcount during execution of the @cached expressions. I guess we could move registration of the finalizer to cached_alloc, skipping that when the cache is active (because the cache is now responsible for freeing contained data)...

avik-pal pushed a commit to avik-pal/CUDA.jl that referenced this pull request Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. performance How fast can we go?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants