Skip to content

Wrapping nfloat #202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Wrapping nfloat #202

wants to merge 8 commits into from

Conversation

Joel-Dahne
Copy link
Collaborator

This is still only a proof of concept, but I have managed to create a basic wrapper of the nfloat type. Note that for this to work you need to manually specify FLINT_jll to use a version of Flint which is recent enough. If you have a locally compiled version of Flint you can put

[FLINT_jll]
libflint_path = "path/to/flint/libflint.so"

in the file LocalPreferences.toml in the root directory of Arblib.jl and Julia should pick it up. Note that along the way I have had to make a number of design decisions, some of these might have to be changed!

Types

The low level types currently used are

const GR_CTX_STRUCT_DATA_BYTES = 6 * sizeof(UInt)
mutable struct nfloat_ctx_struct{P,F}
    data::NTuple{GR_CTX_STRUCT_DATA_BYTES,UInt8}
    which_ring::UInt
    sizeof_elem::Int
    methods::Ptr{Cvoid}
    size_limit::UInt

    function nfloat_ctx_struct{P,F}() where {P,F}
        @assert P isa Int && F isa Int
        ctx = new{P,F}()
        ret = init!(ctx, 64P, F)
        iszero(ret) || throw(DomainError(P, "cannot set precision to this value"))
        return ctx
    end
end

const NFLOAT_HEADER_LIMBS = 2
mutable struct nfloat_struct{P,F}
    head::NTuple{NFLOAT_HEADER_LIMBS,UInt}
    d::NTuple{P,UInt} # FIXME: Should be different for 32 bit systems

    function nfloat_struct{P,F}() where {P,F}
        @assert P isa Int && F isa Int
        res = new{P,F}()
        init!(res, nfloat_ctx_struct{P,F}())
        return res
    end
end

Both of these types depend on two type parameters, P and F, corresponding to the precision and flags used. The nfloat_struct type must depend on P since the memory layout depends on the precision used. The nfloat_ctx_struct doesn't have to depend on P and neither of them have to depend on F.

The motivation for having both P and F as type parameters is that this allows us to check on the type level that two nfloat-instances correspond to the same underlying context, and hence are allowed to be used together. This also means that the nfloat_struct doesn't have to carry around any reference to its underlying context, since it is uniquely determined by P and F. Note that the flag is represented by an int in Flint, but for the type parameter I opted to go for the Julia type Int which corresponds to slong (this is then converted to a Cint before calling Flint).

The high level types are given by

struct NFloat{P,F} <: AbstractFloat
    nfloat::nfloat_struct{P,F}

    NFloat{P,F}() where {P,F} = new{P,F}(nfloat_struct{P,F}())
end

struct NFloatRef{P,F} <: AbstractFloat
    nfloat_ptr::Ptr{nfloat_struct{P,F}}
    parent::Union{Nothing}
end

Basic usage

I have implemented some functionality, like basic arithmetic and elementary functions

julia> using Arblib

julia> x = NFloat{4,0}(1)
1.0

julia> y = NFloat{4,0}(π)
3.1415926535897932384626433832795028841971693993751058209749445923078164062862

julia> x + y
4.1415926535897932384626433832795028841971693993751058209749445923078164062862

julia> 2x
2.0

julia> sin(x) + sqrt(y)
2.6139248357134125339506698049714441824201125169207581938865594998448216889823

Low level wrapper

The wrapping of the Flint functions is working, but so far very basic. I have added the new types (which required some changes to handle type parameters) and special handling of the ctx as a keyword argument. The function

int nfloat_add(nfloat_ptr res, nfloat_srcptr x, nfloat_srcptr y, gr_ctx_t ctx)

is currently compiled to

function add!(res::NFloatLike, x::NFloatLike, y::NFloatLike, ctx::Arblib.nfloat_ctx_struct)
    __ret = ccall(
        Arblib.@libflint("nfloat_add"),
        Int32,
        (
            Ref{Arblib.nfloat_struct},
            Ref{Arblib.nfloat_struct},
            Ref{Arblib.nfloat_struct},
            Ref{Arblib.nfloat_ctx_struct},
        ),
        res,
        x,
        y,
        ctx,
    )
    __ret
end

(plus some other versions to handle keyword arguments). Note that

const NFloatLike{P,F} = Union{NFloat{P,F},NFloatRef{P,F},nfloat_struct{P,F}}

This works, but has two main issues

  1. The type parameters for the arguments do currently not have to be the same. So it is possible to call this using values with different precisions or different flags.
  2. The types used in the ccall are not concrete types since the type parameters P and F are left out. It still works, in the end Flint gets a pointer pointing to the right object, but has a large performance impact. The type stability means that Julia code has to make several allocations in the process of doing the ccall.

What we would like the code to look like is

function add!(res::NFloatLike{P,F}, x::NFloatLike{P,F}, y::NFloatLike{P,F}, ctx::Arblib.nfloat_ctx_struct{P,F}) where {P,F}
    __ret = ccall(
        Arblib.@libflint("nfloat_add"),
        Int32,
        (
            Ref{Arblib.nfloat_struct{P,F}},
            Ref{Arblib.nfloat_struct{P,F}},
            Ref{Arblib.nfloat_struct{P,F}},
            Ref{Arblib.nfloat_ctx_struct{P,F}},
        ),
        res,
        x,
        y,
        ctx,
    )
    __ret
end

This forces all arguments to be compatible, and removes the type instability in the ccall. Generating this code does however seem to be slightly cumbersome. In principle I know how to get it done, but it involves more manual work on the Expr objects making up the code. I'll see if I can get a working prototype for it!

Design decisions

There are a number of farily significant design decisions we have to make.

What should the default flags be? Since one of the goals of Arblib.jl is to make it easy to use Flint (Arb) types in generic Julia code I think enabling all of NFLOAT_ALLOW_UNDERFLOW, NFLOAT_ALLOW_INF and NFLOAT_ALLOW_NAN would be the most reasonable choice. The Flint manual mentions that this gives up some performance, do you have an estimate for how much this is Fredrik? My guess would be that the intermediate allocations in Julia sort of make these differences irrelevant. If you want top performance you should use the low level mutating functions, and then you can also manually specify a different flag.

How should we handle the return values? Most nfloat functions return 0 on success and non-zero on failure (corresponding to GR_SUCCESS, GR_DOMAIN and GR_UNABLE). For the high level interface I think the most reasonable approach is to just throw an error if it returns non-zero, and point people to the low level interface if the want more fine grained control. As it is now the low level interface just returns the result, and it is up to the caller to make use of it. This gives maximum control but is slightly cumbersome to use. For example it means we have to write code like

NFloat{P,F}(x) where {P,F} = (res = NFloat{P,F}(); set!(res, x); res)

(which still doesn't even check that the return value is zero) instead of

NFloat{P,F}(x) where {P,F} = set!(NFloat{P,F}(), x)

(which is how the corresponding version for Arb looks like). One approach which I think might be reasonable is to add a argument to the low level wrappers which makes it either return the return value, or throw an error in case it is non-zero and otherwise return the first argument to the function. A similar flag would actually be helpful for the arf functions as well, they return a flag with information about the rounding, which forces us to write a lot of functions like this

function Base.:+(x::ArfOrRef, y::Union{ArfOrRef,_BitInteger})
    z = Arf(prec = _precision(x, y))
    add!(z, x, y)
    return z
end

compared to the version for Arb which is

Base.:+(x::ArbOrRef, y::Union{ArbOrRef,ArfOrRef,_BitInteger}) =
        add!(Arb(prec = _precision(x, y)), x, y)

How should we handle promotion between NFloat values from different contexts? I guess taking the highest precision and combine their flags should be a reasonable approach? Something like this

Base.promote_rule(
    ::Type{<:NFloatOrRef{P1,F1}},
    ::Type{<:Union{NFloatOrRef{P2,F2}}},
) where {P1,P2,F1,F2} = NFloat{max(P1, P2),F1 | F2}

Which does actually seem to be type stable! Note that there is currently no way of constructing a NFloat value from another which doesn't have the same contex, for this we would need to wrap

int nfloat_set_other(nfloat_ptr res, gr_srcptr x, gr_ctx_t x_ctx, gr_ctx_t ctx)

which I have not handled yet.

Left to do

Apart from the many things mentioned above there are also more things to handle:

  • Wrap nfloat_complex in the same way.
  • Handle vectors of NFloat using NFloatVector.
  • Put the code in the right places, currently most things are in src/nfloat.jl.
  • Add documentation
  • Add tests
  • Probably many more things...

The previous version only handled arb_ptr and acb_ptr, it can now
handle any type ending in _ptr or _srcptr.
@fredrik-johansson
Copy link

fredrik-johansson commented Jan 29, 2025

Nice work! By the way, would you mind if we list you as a remote participant on https://flintlib.github.io/workshop2025.html?

What should the default flags be?

The performance impact of NFLOAT_ALLOW_INF and NFLOAT_ALLOW_NAN is not negligible as soon as one works with vectors, as these flags will disable use of optimized vec methods which don't have checks for Inf/NaN. Since Inf/NaN are not fully implemented anyway, I would suggest disabling them by default.

NFLOAT_ALLOW_UNDERFLOW has virtually no impact on performance though.

For the high level interface I think the most reasonable approach is to just throw an error if it returns non-zero, and point people to the low level interface if the want more fine grained control.

Sure.

@Joel-Dahne
Copy link
Collaborator Author

Feel free to add me as a remote participant!

Then enabling underflow and disabling infinities and NaN is maybe the best approach. It should be easy to make adjustments to the default further down the line at least, so we don't have to commit to anything now.

Currently trying to get there type parameters to play well with the low level wrapper, we'll see how it goes!

Previously it could return `Vector{<:Integer}`, but giving anything
other than `Vector{Int}` as an argument would return an error when
trying to convert it to `Ref{Int}`. It now just returns `Vector{Int}`.
This reduces the number of warnings about a method being overwritten
when running the tests. There are still some occurrences from the use
of `fpwrap_error_on_failure_default`, but these seem harder to
avoid (since we do need to test that the overwriting does work for
that method).
Reorder types based on where they are coming from. Remove deprecated
types from Arb that were removed in the transition to Flint.
@Joel-Dahne
Copy link
Collaborator Author

With a little bit of fiddling I managed to get the type parameters to work as I wanted! The wrapper now ensures that the type parameters for the input all agree and also makes use of it in the ccall. It currently doesn't support different arguments having different type parameters, the only instance of this in nfloat is

int nfloat_set_other(nfloat_ptr res, gr_srcptr x, gr_ctx_t x_ctx, gr_ctx_t ctx)
int nfloat_complex_set_other(nfloat_complex_ptr res, gr_srcptr x, gr_ctx_t x_ctx, gr_ctx_t ctx)

which we probably want to wrap by hand either way. For the generic interface handling different type parameters might be more important, but that is a problem for the future.

With this it is possible to do some performance comparisons between NFloat, Arf and Arb. I have looked at the performance for an inplace sum function as well as the regular, allocating, sum function. If we let

using Arblib, BenchmarkTools

function sum!(res, xs)
    Arblib.zero!(res)
    @inbounds for x in xs
        Arblib.add!(res, res, x)
    end
    return res
end

N = 10000
xs_nfloat = [NFloat{4,0}(1 // 7) for _ = 1:N];
xs_arf = [Arf(1 // 7) for _ = 1:N];
xs_arb = [Arb(1 // 7) for _ = 1:N];
res_nfloat = zero(xs_nfloat[1])
res_arf = zero(xs_arf[1])
res_arb = zero(xs_arb[1])

then we get

julia> @benchmark sum!($res_nfloat, $xs_nfloat) samples = 10000

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  53.173 μs  125.746 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     54.985 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   55.205 μs ±   2.340 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▃▄▅▅▅▆▆█▄▄▃▃▃▁▁                                 
  ▁▁▁▁▁▂▃▅▅▆▆▅▇████████████████▇█▆▅▆▅▄▄▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  53.2 μs         Histogram: frequency by time         58.3 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sum!($res_arf, $xs_arf) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  202.616 μs  515.372 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     213.931 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   215.039 μs ±  10.674 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▂▃▅▇█▇█▇▆▇▅▂▁                                    
  ▁▁▁▁▁▂▂▂▃▄▄▅▆▇█████████████▇▆▆▅▅▃▃▂▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  203 μs           Histogram: frequency by time          236 μs <

 Memory estimate: 240 bytes, allocs estimate: 5.

julia> @benchmark sum!($res_arb, $xs_arb) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  242.106 μs  829.997 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     252.684 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   254.678 μs ±  14.446 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▁▂▄▄▅▅▇█▇▇█▆▅▄▂▁                                       
  ▂▂▂▃▃▄▆███████████████████▆▆▅▅▅▅▅▅▄▅▄▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂ ▄
  242 μs           Histogram: frequency by time          278 μs <

 Memory estimate: 48 bytes, allocs estimate: 1.

julia> @benchmark sum($xs_nfloat) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  108.621 μs    7.196 ms  ┊ GC (min  max):  0.00%  95.63%
 Time  (median):     129.644 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   199.633 μs ± 310.378 μs  ┊ GC (mean ± σ):  22.06% ± 13.73%

  █▄▄▁ ▅ ▃                                                      ▁
  ████████▇▅▃▃▃▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▄▆▆▇▇█▇ █
  109 μs        Histogram: log(frequency) by time        1.9 ms <

 Memory estimate: 624.94 KiB, allocs estimate: 9999.

julia> @benchmark sum($xs_arf) samples = 10000
BenchmarkTools.Trial: 4217 samples with 1 evaluation per sample.
 Range (min  max):  448.154 μs  227.609 ms  ┊ GC (min  max):  0.00%  39.51%
 Time  (median):     619.502 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.184 ms ±   8.716 ms  ┊ GC (mean ± σ):  25.76% ±  3.64%

                    ▂  ▆█▇▅▅▅                                    
  ▃▃▂▂▃▅▅▅▅▄▄▃▃▂▃▃▃██▇████████▄▃▂▂▂▂▂▂▂▂▂▁▂▁▂▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  448 μs           Histogram: frequency by time          917 μs <

 Memory estimate: 935.91 KiB, allocs estimate: 19966.

julia> @benchmark sum($xs_arb) samples = 10000
BenchmarkTools.Trial: 4000 samples with 1 evaluation per sample.
 Range (min  max):  488.339 μs  124.003 ms  ┊ GC (min  max):  0.00%  44.99%
 Time  (median):     679.100 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.277 ms ±   8.060 ms  ┊ GC (mean ± σ):  22.75% ±  3.64%

  ▃▇▇▅▅▄▃▁          ▁   ▁▇█▆▅▆▆▅▇█▇▆▃                           ▃
  ██████████▇▇▇██▇▇██▇▆▄███████████████▇██▆▇▇▆▆▆▅▇▁▆▆▆▅▅▄▁▁▅▅▄▅ █
  488 μs        Histogram: log(frequency) by time        932 μs <

 Memory estimate: 1.07 MiB, allocs estimate: 19998.

So for the inplace sum!, NFloat is about 4 times faster than Arf and 5 times faster than Arb. For the allocating sum both the minimum time as well as the mean time are relevant. When looking at the mean time NFloat is close to 5 times faster than Arf and a little bit more than 5 times faster than Arb. Note that all of these computations are done using 256 bits of precision. Of course something like this would in practice be better to do using the Flint vector functions. Would also be interesting to compare the numbers to similar code in C, I don't know how much the overhead from using Julia is.

There are of course still plenty of things to do! For the wrapper the main remaining thing is the handling of return values as well as adding nfloat_complex and the vector types. For the high level interface there even more to do.

But I think I'll declare success for the workshop week! It seems like we should be able to wrap the nfloat types in a performant way without requiring a huge amount of work. It will probably take a little while before I get the time to finish a full implementation though.

@Joel-Dahne Joel-Dahne mentioned this pull request Mar 19, 2025
Joel-Dahne referenced this pull request Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants