Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freeze/thaw interface for StaticStrideArrays #37

Open
MasonProtter opened this issue Apr 5, 2023 · 9 comments
Open

freeze/thaw interface for StaticStrideArrays #37

MasonProtter opened this issue Apr 5, 2023 · 9 comments

Comments

@MasonProtter
Copy link
Contributor

Currently, a StaticStrideArray must be a mutable struct so that we can get reliable pointers to it's storage in order to avoid interacting with the NTuple data.

However, this causes problems with StaticStrideArrays if they cross non-inlined function boundaries because then the storage data will end up on the heap even if it doesn't need to be there, requiring us to deal with the scariness of PtrArrays @gc_preserve.

One possible solution to this problem would be to have StaticStrideArray be an immutable struct, and then have a MutableStaticStrideArray that can be created from it on-demand in order to get a pointer, and then converted right back to the immutable version as soon as we're done.

Here's a little demo of what I'm imagining:

using StrideArrays, StrideArraysCore
using StrideArraysCore: StaticStrideArray

macro new(T, args...)
    Expr(:new, T, args...) |> esc
end

struct FrozenStaticStrideArray{T, N, R, S, X, O, L} <: AbstractStrideArray{T, N, R, S, X, O}
    data::NTuple{L,T}
end
function freeze(sa::StaticStrideArray{T, N, R, S, X, O, L}) where {T, N, R, S, X, O, L}
    @new FrozenStaticStrideArray{T, N, R, S, X, O, L} getfield(sa, 1)
end
function thaw(fsa::FrozenStaticStrideArray{T, N, R, S, X, O, L}) where {T, N, R, S, X, O, L}
    @new StaticStrideArray{T, N, R, S, X, O, L} getfield(fsa, 1)
end

function Base.getindex(fsa::FrozenStaticStrideArray, i::Int...)
    thaw(fsa)[i...]
end
let
    sa = StaticStrideArray{Float64}(undef, (static(1000),)) .= 1
    fsa = freeze(sa)
    @btime $sa[1000]
    @btime $fsa[1000]
endl

#+RESULTS:
:   1.940 ns (0 allocations: 0 bytes)
:   1.939 ns (0 allocations: 0 bytes)

I think having an immutable variant like this could make this a much more convincing replacement for StaticArrays.jl.

@chriselrod
Copy link
Member

I am in favor, please feel free to make a PR!

@ChrisRackauckas and I have been talking for some time about making a StaticArrays.jl replacement that doesn't unroll anything, using loops for all implementations, to avoid the compile time and scalability problems that StaticArrays.jl has.

I'm also in favor of making StrideArrays(Core) that library.

It should perhaps also be updated to make use of the new Pkg extensions.

@MasonProtter
Copy link
Contributor Author

Hm, it seems that julia is being a bit overly cautious about avoiding us mutating a region of stack memory that belonged to an immutable variable, so a freeze/thaw is not as cheap an operation as one might hope. All the memory needs to be moved:

julia> let v = StrideArray{Float64}(undef, static(1000)) .= 1
           fv = freeze(v)
           @btime freeze($(thaw(fv)))
       end;
  153.136 ns (0 allocations: 0 bytes)

julia> let v = StrideArray{Float64}(undef, static(1000)) .= 1
           fv = freeze(v)
           code_llvm(Tuple{typeof(fv)}) do fv
               freeze(thaw(fv))
           end
       end;
;  @ REPL[8]:4 within `#3`
define void @"julia_#3_2881"([1 x [1000 x double]]* noalias nocapture sret([1 x [1000 x double]]) %0, [1 x [1000 x double]]* nocapture nonnull readonly align 8 dereferenceable(8000) %1) #0 {
top:
; ┌ @ REPL[5]:2 within `freeze`
   %.sroa.0.0..sroa_cast = bitcast [1 x [1000 x double]]* %1 to i8*
   %2 = bitcast [1 x [1000 x double]]* %0 to i8*
   call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(8000) %2, i8* noundef nonnull align 8 dereferenceable(8000) %.sroa.0.0..sroa_cast, i64 8000, i1 false)
; └
  ret void
}

Not a show-stopper, but having a cost to freezing and thawing does put a damper on things...

@chriselrod
Copy link
Member

Yes, we're aware of that problem. A few people looked at it and couldn't come up with simple fixes, but we think it's still worth it despite this cost (and eventually, compiler improvements should eliminate that copy).

@MasonProtter
Copy link
Contributor Author

Okay, I'll see what I can do. What is your appetite here for a breaking version?

I'm picturing replacing what is currently StaticStrideArray with a StrideArray wrapping a Ref{<:NTuple}, i.e.

function StrideArray{T}(::UndefInitializer, sz::StaticInt...) where {T}
    r = Ref{NTuple{Int(prod(sz)), T}}
    p = Ptr{T}(pointer_from_objref(r))
    pA = PtrArray(p, sz)
    return StrideArray(pA, r)
end

and then the name StaticStrideArray would then be used for the immutable version. Does that make sense, or was there a reason that StaticStrideArray is a separate type from StrideArray with static storage / size?

@chriselrod
Copy link
Member

I'm okay with a breaking version, as there's only a handful of packages I maintain depending on this, and none use StaticStrideArray, so it won't be too much work to update.

Does that make sense, or was there a reason that StaticStrideArray is a separate type from StrideArray with static storage / size?

I want the array passed as a pointer to memory, not a pointer to a pointer to memory. I had it closer to what you described before, but this caused a slight loss in performance.
I thought there was an associated issue, but it may have only been a Zulip discussion. Perhaps @cscherrer recalls.

@Tokazama
Copy link
Member

Tokazama commented Apr 8, 2023

I think I made an issue for this in ArrayInterface a while ago

@cscherrer
Copy link

a StaticArrays.jl replacement that doesn't unroll anything, using loops for all implementations, to avoid the compile time and scalability problems that StaticArrays.jl has

How would this be different from a SizedArray?

I want the array passed as a pointer to memory, not a pointer to a pointer to memory. I had it closer to what you described before, but this caused a slight loss in performance.
I thought there was an associated issue, but it may have only been a Zulip discussion. Perhaps @cscherrer recalls.

Sorry, I don't remember. Could have been Zulip.

@MasonProtter
Copy link
Contributor Author

How would this be different from a SizedArray?

It wouldn't be backed by a Vector storage, but instead a type which can be stack allocated.

@cscherrer
Copy link

Ah right, of course. Thanks @MasonProtter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants