Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Julia implementations of normpdf etc. in StatsFuns #1487

Merged
merged 8 commits into from
Feb 2, 2022

Conversation

devmotion
Copy link
Member

This PR replaces the implementations of pdf(::Normal, ::Real) etc. with calls to StatsFuns.normpdf etc. which are basically the same Julia implementations.

This removes code duplication (which e.g. means that one would not have to define ChainRules derivatives in both packages but it is sufficient to do so in StatsFuns) and it actually fixes subtle bugs such as reported in #1067 (comment) and arising from undesired promotions in Normal(mu, sigma) calls in the current implementation in Distributions.

Tests require JuliaStats/StatsFuns.jl#132 which fixes a bug in the StatsFuns implementation covered by our tests.

Project.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a good idea in general. Although it looks like StatsBase's normal quantile function promotes to Float64 unnecessarily (see failures).

src/univariate/continuous/normal.jl Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
@devmotion
Copy link
Member Author

Should we bump the Julia bound to 1.3 or 1.6? The test errors on Julia 1.0 are caused by an ancient version of SpecialFunctios (0.8 😮) that does not contain implementations of erfcinv(::Rational) (introduced in SpecialFunctions 1.2.0). All more recent versions of SpecialFunctions require Julia 1.3, hence for this specific issue it would be sufficient to drop Julia < 1.2.

@devmotion
Copy link
Member Author

Alternatively, we could just not run these failing tests on Julia < 1.3.

@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2022

Codecov Report

Merging #1487 (86113fd) into master (6f74846) will increase coverage by 0.12%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1487      +/-   ##
==========================================
+ Coverage   84.80%   84.92%   +0.12%     
==========================================
  Files         126      128       +2     
  Lines        7778     7735      -43     
==========================================
- Hits         6596     6569      -27     
+ Misses       1182     1166      -16     
Impacted Files Coverage Δ
src/univariate/continuous/normal.jl 98.23% <100.00%> (-0.73%) ⬇️
src/samplers.jl 100.00% <0.00%> (ø)
src/Distributions.jl 100.00% <0.00%> (ø)
src/quantilealgs.jl 82.60% <0.00%> (+0.19%) ⬆️
src/common.jl 79.79% <0.00%> (+0.20%) ⬆️
src/matrixvariates.jl 95.65% <0.00%> (+0.41%) ⬆️
src/univariates.jl 74.07% <0.00%> (+2.64%) ⬆️
src/multivariates.jl 42.85% <0.00%> (+4.39%) ⬆️
src/utils.jl 92.42% <0.00%> (+28.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f74846...86113fd. Read the comment docs.

@devmotion devmotion requested a review from mschauer January 25, 2022 15:21
@devmotion
Copy link
Member Author

devmotion commented Feb 1, 2022

OK, this PR is ready for review @mschauer @sethaxen. I decided to change the Julia bound to 1.3 and SpecialFunctions to 1.2. These changes ensure that there are no regressions for environments that use older Julia or older SpecialFunctions versions.

Copy link
Contributor

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mschauer mschauer merged commit 620a73a into master Feb 2, 2022
@devmotion devmotion deleted the dw/normal_statsfuns branch February 2, 2022 14:38
@devmotion
Copy link
Member Author

Thanks for merging the PR @mschauer!

I guess Distributions doesn't follow the ColPrac guidelines officially but maybe we should adopt the policy that PR authors with merge rights merge their own PRs. Even after a PR is approved I sometimes notice an issue that has to be addressed or something minor that I'd like to change (wasn't the case here 🙂), and then it's nice if the PR is not merged yet and I can update it.

@alyst
Copy link
Member

alyst commented Feb 22, 2022

Is it intentional that this PR removed zval(), xval() and potentially something else?

@devmotion
Copy link
Member Author

Yes. xval and zval are not used anymore.

@alyst
Copy link
Member

alyst commented Feb 22, 2022

But are they still exported and defined for many other distributions?
Edit: Ok, not exported, but still defined.

@devmotion
Copy link
Member Author

They are just internal helpers that simplify the code a bit and make it more readable. BTW StatsFuns uses internal helpers of the same name: https://github.com/JuliaStats/StatsFuns.jl/blob/13e231a0a22e716426b73cb87ff3b8b24e33aaf1/src/distrs/norm.jl#L3

@alyst
Copy link
Member

alyst commented Feb 22, 2022

The polymorphic nature of the Distribution.jl ones is/was quite convenient.
If the ones from StatsFuns.jl are not exported, maybe it makes sense to restore zval/xval for Normal here?

@devmotion
Copy link
Member Author

I don't understand, why would you you want them to be imported? They are not exported by StatsFuns because they are an internal implementation detail. So if you want to use them you could just call StatsFuns.xval etc., with the risk of them being removed in a future non-breaking release. Additionally, the version in StatsFuns is defined for real arguments whereas the ones in Distributions are defined for Distributions. But also in Distributions they are only internal functions and not part of the API - they may be removed at any time and they are only defined for a handful of functions.

@alyst
Copy link
Member

alyst commented Feb 22, 2022

I don't want StatsFuns.zval to be imported.
In my code, I was using the ones from Distributions.jl that have the signature zval(d::Distribution, x::Number) (that's what I mean by "polymorphic").
So I was wondering if it makes sense to restore zval(d::Normal, x::Number) here.
I understand that non-exported things can break at any time, I just wonder what is your recommendation to get the z-value for a given distribution (symmetric continuous univariate)?
For me, zval(d::Distribution, x::Number) was a convenient interface for that. Obviously, for some distributions it doesn't make sense to define zval().

As for the ones in StatsFuns.jl, since they are not exported, the name clash with Distributions.jl is not critical, but to be consistent with the other normal-related functions there, they could be renamed into normzval/normxval().

@devmotion
Copy link
Member Author

Taking one step back, what exactly do you expect that zval(d::Distribution, x::Number) should return? In Distributions it is used as an auxiliary function and also for some asymmetric distributions in a way that allows us to simplify the logpdf etc. implementations. Many definitions are identical to zval(d, x) = (x - location(d)) / scale(d) but this is not satisfied in general.

@alyst
Copy link
Member

alyst commented Feb 23, 2022

Taking one step back, what exactly do you expect that zval(d::Distribution, x::Number) should return?
Many definitions are identical to zval(d, x) = (x - location(d)) / scale(d) but this is not satisfied in general.

That's what I expect (for the transformed distribution the mean is zero, the std is one), and that's why I specify that it might make sense to define it only for symmetric continuous univariate.

@devmotion
Copy link
Member Author

In that case, can't you define and use this function myzval(d, x) = (x - location(d)) / scale(d) in your code? scale and location are part of the API, so this seems a clean and safe solution.

@alyst
Copy link
Member

alyst commented Feb 23, 2022

I can, I was just wondering if it could be implemented by the package, given that zval() is already defined for a dozen of distributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants