Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rand_distr::TruncatedNormal #1523

Closed
wants to merge 3 commits into from
Closed

Conversation

ongchi
Copy link

@ongchi ongchi commented Nov 4, 2024

  • Added a CHANGELOG.md entry

Summary

Add truncated normal distribution to rand_distr, allowing users to specify the mean, standard deviation, left bound, and right bound of the distribution.

Motivation

As discussion in #1189.
This would be use in numerical simulations, statistics, games, etc.

Details

This implementation is ported from scipy.stats.truncnorm, which relies on Cephes Mathematical Library.
An extra dependency is added of the Rust implementation of the Cephes library (spec_math).

@dhardy
Copy link
Member

dhardy commented Nov 5, 2024

So this is a form of inverse CDF? Quick points:

  • The code presented would be best in a separate file. (This may not be the case with just using Normal with rejection.)
  • There may be licencing concerns from BSD to our licence.
  • How does this approach compare in performance to just using rejection sampling? This would depend on the bounds of course; what bounds may be likely?

@benjamin-lieser
Copy link
Collaborator

I would say the value of a truncated distribution implementation comes from it working also on the tails of the distribution where rejection sampling would be infeasible.

@dhardy dhardy added the D-review Do: needs review label Nov 8, 2024
@ongchi
Copy link
Author

ongchi commented Nov 11, 2024

Yes, the implementation is based on the inverse CDF method.
I can move the code into a separate file, as suggested. Would adding a BSD license header to the file be acceptable for this project?

And I just add a simple benchmark.

The x-axis represents the relative cumulative frequency between the left and right bounds of the normal distribution.

by_rej
by_ppf

The benchmark shows that the performance of the rejection sampling method significantly degrades when the relative cumulative frequency of the bounds is below 5%. In contrast, the inverse CDF (PPF) method maintains constant time performance regardless of the bounds.

@dhardy
Copy link
Member

dhardy commented Nov 12, 2024

Would adding a BSD license header to the file be acceptable for this project?

Unfortunately not. We distribute the project under the users' choice of MIT or Apache 2.0, which means that all contributions must satisfy both of these licences. It seems that scipy is distributed under the 3-clause BSD license. I'm not an expert on the topic, but from what I understand this licence is not compatible with distribution under the MIT licence.

(Note that we can add attributions as a comment in the source, so I don't believe that part is a problem at least.)

Hence, unfortunately, I don't believe we can accept this — unless you can obtain a statement of permission from the original authors, or remove all code which was ported from scipy.

@ongchi
Copy link
Author

ongchi commented Nov 18, 2024

Thank you for your comment. I will close this PR for now and take some time to decide what to do on the next steps.

@ongchi ongchi closed this Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-review Do: needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants