Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve eltype where possible for moments #688

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

palday
Copy link
Member

@palday palday commented Apr 23, 2021

Fixes #680.

@palday palday requested a review from ararslan April 23, 2021 17:10
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
@palday palday requested a review from ararslan April 26, 2021 10:56
@palday
Copy link
Member Author

palday commented Apr 26, 2021

The failure on nightly is an upstream bug (JuliaLang/julia#40609).

src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
@ararslan
Copy link
Member

This looks good but the prior uses of Float64 make me worried that the code may have issues around under/overflow when using other types.

@palday
Copy link
Member Author

palday commented Apr 26, 2021

Ugh, I'm conflicted on that front. Because that would be breaking, but technically a user hitting overflow on their custom narrow types is expected behavior. @nalimilan do we have enough for another breaking pre-1.0 release?

@nalimilan
Copy link
Member

We can break things if we want to tag 1.0. Though ideally I'd rather move these to Statistics, but that work has stalled (JuliaLang/julia#31395).

src/moments.jl Outdated
Comment on lines 239 to 240
T = promote_type(eltype(v), eltype(wv), typeof(m))
s = zero(T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure type stability, how about following exactly what the loop does and using T = typeof(zero(eltype(v) - zero(m))^2 * zero(eltype(w)))? That's what I did in my first attempt to moving these functions to Statistics: https://github.com/JuliaLang/julia/pull/31395/files#diff-6b82a0c4e60cbe68048e35375987a86688afb4aea60dda8f2d2d4938dc845776R375

@palday
Copy link
Member Author

palday commented Apr 26, 2021

Okay, so on the breaking / non breaking front:

  1. Did the API ever specify the return type for mean and the various moments? If not, this is not breaking in terms of type behavior.
  2. In terms of numeric behavior, Julia generally keeps things at narrow types even at the risk of over-/underflow. As such, keeping the narrowest type that would be possible for the arithmetic operations in these computations seems fine. In other words, the new return type and potential numeric implications are expected in the Julia ecosystem.
  3. The potential numeric issues with all of the moments here are possible regardless of type width -- we do accumulate then divide without sorting first, so
    • we have issues with computations where things differ in scale and the smaller values accumulate enough to be relevant on the larger scale, but consistently round down when the initial values in the array are large (sorting solves this but is computationally expensive)
    • we can have overflow in the accumulation step (batching the accumulate+divide on subsets solves this but potentially causes performance and numeric issues from the extra divisions)
  4. The way for best numerical accuracy would be sort, do the math in a higher precision and cast down. But this also seems unacceptable -- if a user choose to keep everything in single or half precision, then we should assume there is a reason for that. If they really need that level of numerical stability, then they should probably just implement their own moments.

In other words: given that we're not going to change the algorithm used (which I think is reasonable), the potential numeric issues from the eltype preservation should not be considered breaking. Perhaps the best solution is to add to each of the docstrings:

!!! note
    The computation is done in the narrowest type for which the underlying 
    arithmetic operations are defined based on the element types of the 
    vectors and the mean. For higher numerical accuracy and to reduce the risk
    of overflow when dealing with values near the limits of the type (or many 
    values whose sum is near the limits of the type), we recommend casting to 
    a broader type / higher precision first.

@nalimilan
Copy link
Member

Yeah it's not super likely to break user code. Though we could collect the few changes which are breaking (some of them are marked with the breaking label) and take that occasion to tag 1.0.

src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
src/moments.jl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std(::AbstractVector{Float32}, ::AnalyticWeights{Float32}) gives a Float64
5 participants