-
Notifications
You must be signed in to change notification settings - Fork 6
Wrapping nfloat
#202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Wrapping nfloat
#202
Conversation
The previous version only handled arb_ptr and acb_ptr, it can now handle any type ending in _ptr or _srcptr.
Nice work! By the way, would you mind if we list you as a remote participant on https://flintlib.github.io/workshop2025.html?
The performance impact of
Sure. |
Feel free to add me as a remote participant! Then enabling underflow and disabling infinities and NaN is maybe the best approach. It should be easy to make adjustments to the default further down the line at least, so we don't have to commit to anything now. Currently trying to get there type parameters to play well with the low level wrapper, we'll see how it goes! |
Previously it could return `Vector{<:Integer}`, but giving anything other than `Vector{Int}` as an argument would return an error when trying to convert it to `Ref{Int}`. It now just returns `Vector{Int}`.
This reduces the number of warnings about a method being overwritten when running the tests. There are still some occurrences from the use of `fpwrap_error_on_failure_default`, but these seem harder to avoid (since we do need to test that the overwriting does work for that method).
Reorder types based on where they are coming from. Remove deprecated types from Arb that were removed in the transition to Flint.
With a little bit of fiddling I managed to get the type parameters to work as I wanted! The wrapper now ensures that the type parameters for the input all agree and also makes use of it in the int nfloat_set_other(nfloat_ptr res, gr_srcptr x, gr_ctx_t x_ctx, gr_ctx_t ctx)
int nfloat_complex_set_other(nfloat_complex_ptr res, gr_srcptr x, gr_ctx_t x_ctx, gr_ctx_t ctx) which we probably want to wrap by hand either way. For the generic interface handling different type parameters might be more important, but that is a problem for the future. With this it is possible to do some performance comparisons between using Arblib, BenchmarkTools
function sum!(res, xs)
Arblib.zero!(res)
@inbounds for x in xs
Arblib.add!(res, res, x)
end
return res
end
N = 10000
xs_nfloat = [NFloat{4,0}(1 // 7) for _ = 1:N];
xs_arf = [Arf(1 // 7) for _ = 1:N];
xs_arb = [Arb(1 // 7) for _ = 1:N];
res_nfloat = zero(xs_nfloat[1])
res_arf = zero(xs_arf[1])
res_arb = zero(xs_arb[1]) then we get julia> @benchmark sum!($res_nfloat, $xs_nfloat) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 53.173 μs … 125.746 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 54.985 μs ┊ GC (median): 0.00%
Time (mean ± σ): 55.205 μs ± 2.340 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▃▄▅▅▅▆▆█▄▄▃▃▃▁▁
▁▁▁▁▁▂▃▅▅▆▆▅▇████████████████▇█▆▅▆▅▄▄▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▄
53.2 μs Histogram: frequency by time 58.3 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark sum!($res_arf, $xs_arf) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 202.616 μs … 515.372 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 213.931 μs ┊ GC (median): 0.00%
Time (mean ± σ): 215.039 μs ± 10.674 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▃▅▇█▇█▇▆▇▅▂▁
▁▁▁▁▁▂▂▂▃▄▄▅▆▇█████████████▇▆▆▅▅▃▃▂▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
203 μs Histogram: frequency by time 236 μs <
Memory estimate: 240 bytes, allocs estimate: 5.
julia> @benchmark sum!($res_arb, $xs_arb) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 242.106 μs … 829.997 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 252.684 μs ┊ GC (median): 0.00%
Time (mean ± σ): 254.678 μs ± 14.446 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▄▄▅▅▇█▇▇█▆▅▄▂▁
▂▂▂▃▃▄▆███████████████████▆▆▅▅▅▅▅▅▄▅▄▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂ ▄
242 μs Histogram: frequency by time 278 μs <
Memory estimate: 48 bytes, allocs estimate: 1.
julia> @benchmark sum($xs_nfloat) samples = 10000
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 108.621 μs … 7.196 ms ┊ GC (min … max): 0.00% … 95.63%
Time (median): 129.644 μs ┊ GC (median): 0.00%
Time (mean ± σ): 199.633 μs ± 310.378 μs ┊ GC (mean ± σ): 22.06% ± 13.73%
█▄▄▁ ▅ ▃ ▁
████████▇▅▃▃▃▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▅▄▆▆▇▇█▇ █
109 μs Histogram: log(frequency) by time 1.9 ms <
Memory estimate: 624.94 KiB, allocs estimate: 9999.
julia> @benchmark sum($xs_arf) samples = 10000
BenchmarkTools.Trial: 4217 samples with 1 evaluation per sample.
Range (min … max): 448.154 μs … 227.609 ms ┊ GC (min … max): 0.00% … 39.51%
Time (median): 619.502 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.184 ms ± 8.716 ms ┊ GC (mean ± σ): 25.76% ± 3.64%
▂ ▆█▇▅▅▅
▃▃▂▂▃▅▅▅▅▄▄▃▃▂▃▃▃██▇████████▄▃▂▂▂▂▂▂▂▂▂▁▂▁▂▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
448 μs Histogram: frequency by time 917 μs <
Memory estimate: 935.91 KiB, allocs estimate: 19966.
julia> @benchmark sum($xs_arb) samples = 10000
BenchmarkTools.Trial: 4000 samples with 1 evaluation per sample.
Range (min … max): 488.339 μs … 124.003 ms ┊ GC (min … max): 0.00% … 44.99%
Time (median): 679.100 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.277 ms ± 8.060 ms ┊ GC (mean ± σ): 22.75% ± 3.64%
▃▇▇▅▅▄▃▁ ▁ ▁▇█▆▅▆▆▅▇█▇▆▃ ▃
██████████▇▇▇██▇▇██▇▆▄███████████████▇██▆▇▇▆▆▆▅▇▁▆▆▆▅▅▄▁▁▅▅▄▅ █
488 μs Histogram: log(frequency) by time 932 μs <
Memory estimate: 1.07 MiB, allocs estimate: 19998. So for the inplace There are of course still plenty of things to do! For the wrapper the main remaining thing is the handling of return values as well as adding But I think I'll declare success for the workshop week! It seems like we should be able to wrap the nfloat types in a performant way without requiring a huge amount of work. It will probably take a little while before I get the time to finish a full implementation though. |
This is still only a proof of concept, but I have managed to create a basic wrapper of the
nfloat
type. Note that for this to work you need to manually specifyFLINT_jll
to use a version of Flint which is recent enough. If you have a locally compiled version of Flint you can putin the file
LocalPreferences.toml
in the root directory of Arblib.jl and Julia should pick it up. Note that along the way I have had to make a number of design decisions, some of these might have to be changed!Types
The low level types currently used are
Both of these types depend on two type parameters,
P
andF
, corresponding to the precision and flags used. Thenfloat_struct
type must depend onP
since the memory layout depends on the precision used. Thenfloat_ctx_struct
doesn't have to depend onP
and neither of them have to depend onF
.The motivation for having both
P
andF
as type parameters is that this allows us to check on the type level that two nfloat-instances correspond to the same underlying context, and hence are allowed to be used together. This also means that thenfloat_struct
doesn't have to carry around any reference to its underlying context, since it is uniquely determined byP
andF
. Note that the flag is represented by anint
in Flint, but for the type parameter I opted to go for the Julia typeInt
which corresponds toslong
(this is then converted to aCint
before calling Flint).The high level types are given by
Basic usage
I have implemented some functionality, like basic arithmetic and elementary functions
Low level wrapper
The wrapping of the Flint functions is working, but so far very basic. I have added the new types (which required some changes to handle type parameters) and special handling of the
ctx
as a keyword argument. The functionis currently compiled to
(plus some other versions to handle keyword arguments). Note that
This works, but has two main issues
ccall
are not concrete types since the type parametersP
andF
are left out. It still works, in the end Flint gets a pointer pointing to the right object, but has a large performance impact. The type stability means that Julia code has to make several allocations in the process of doing theccall
.What we would like the code to look like is
This forces all arguments to be compatible, and removes the type instability in the
ccall
. Generating this code does however seem to be slightly cumbersome. In principle I know how to get it done, but it involves more manual work on theExpr
objects making up the code. I'll see if I can get a working prototype for it!Design decisions
There are a number of farily significant design decisions we have to make.
What should the default flags be? Since one of the goals of Arblib.jl is to make it easy to use Flint (Arb) types in generic Julia code I think enabling all of
NFLOAT_ALLOW_UNDERFLOW
,NFLOAT_ALLOW_INF
andNFLOAT_ALLOW_NAN
would be the most reasonable choice. The Flint manual mentions that this gives up some performance, do you have an estimate for how much this is Fredrik? My guess would be that the intermediate allocations in Julia sort of make these differences irrelevant. If you want top performance you should use the low level mutating functions, and then you can also manually specify a different flag.How should we handle the return values? Most
nfloat
functions return 0 on success and non-zero on failure (corresponding toGR_SUCCESS
,GR_DOMAIN
andGR_UNABLE
). For the high level interface I think the most reasonable approach is to just throw an error if it returns non-zero, and point people to the low level interface if the want more fine grained control. As it is now the low level interface just returns the result, and it is up to the caller to make use of it. This gives maximum control but is slightly cumbersome to use. For example it means we have to write code like(which still doesn't even check that the return value is zero) instead of
(which is how the corresponding version for
Arb
looks like). One approach which I think might be reasonable is to add a argument to the low level wrappers which makes it either return the return value, or throw an error in case it is non-zero and otherwise return the first argument to the function. A similar flag would actually be helpful for thearf
functions as well, they return a flag with information about the rounding, which forces us to write a lot of functions like thiscompared to the version for
Arb
which isHow should we handle promotion between
NFloat
values from different contexts? I guess taking the highest precision and combine their flags should be a reasonable approach? Something like thisWhich does actually seem to be type stable! Note that there is currently no way of constructing a
NFloat
value from another which doesn't have the same contex, for this we would need to wrapwhich I have not handled yet.
Left to do
Apart from the many things mentioned above there are also more things to handle:
nfloat_complex
in the same way.NFloat
usingNFloatVector
.src/nfloat.jl
.