Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling: Panics vs Result #41

Closed
boxtown opened this issue Mar 20, 2017 · 7 comments
Closed

Error handling: Panics vs Result #41

boxtown opened this issue Mar 20, 2017 · 7 comments

Comments

@boxtown
Copy link
Collaborator

boxtown commented Mar 20, 2017

Currently the responsibility for guarding against exceptional cases (e.g. input not in valid domain, mathematically invalid operations etc) is passed to the user. We panic when an operation does not make mathematical sense (e.g. calculating the cumulative distribution function for discrete distributions at a negative input) which forces users to double check to make sure their inputs are valid. While this results in technically correct and predictable behavior from the API, I'm not sure if it's ergonomic or idiomatic and have been mulling over possibly introducing a Result based API either replacing or in addition to the stricter panic based API. This however warrants some discussion and I would love feedback from the community

@cbreeden
Copy link

I think consistency is huge here. I did a cursory glance at other crates and noticed the following:

Quick Survey

STD

  • f64/f32 operations: No result or panics. NaN, Inf, and -Inf are used appropriately.
  • Arithmetic operations: panic! in debug on overflow.
  • Result is used for parsing.
  • Option is used for TryFrom conversion of numeric types (which can fail, otherwise From).

Rand crate

  • Distributions panic! for invalid input parameters (negative std, min > max, etc).

Num/Num-Traits

  • Ratios with zero denominator panic!s
  • Option is used for fallible type conversions (ie, Float -> BigInt if Float is not finite)
  • FromPrimitive and ToPrimitive traits use Option

It seems to me that a takeaway could be:

  • Option should be used for fallible conversions.
  • Result should be used for parsing from binary or utf8.
  • panic! for invalid inputs for mathematical operations, including with the initialization of such objects (like Ratio or distributions).
  • Use NaN, Inf and -Inf for floats as appropriate.
  • Provide "checked" variations when possible. These methods use an Option. (I suppose under the assumption that only one thing could go wrong; or it's usually assumed that the exact reason for failure isn't that important).

@boxtown
Copy link
Collaborator Author

boxtown commented May 1, 2017

@cbreeden Thanks for the feedback and yes, I agree consistency is key which I feel like overall the API is pretty consistent with the interface it provides. My main concern is whether or not the API as it stands is ergonomic for users. I've taken a page from the rand crate's book in panic!-ing on invalid mathematical inputs which I think is more idiomatic than generating NaNs, I could be wrong however as I have yet to really dogfood my own library. It would be interesting to see or hear if a Result-based API in addition to or in place of the panic-based one garners any interest. It would be a pretty big undertaking but I think one that should be hammered out way before a 1.0-release

@cbreeden
Copy link

cbreeden commented May 2, 2017

Yeah ergonomics is good question. Some complaints that I've heard (about rust in general) is how difficult it is to write expressive numerical code. When I was using statrs, I was using it to simulate some markov chains. Personally, I had no interest in checking for the validity of my inputs for my distributions (and would rather let it panic, though I do understand why someone would disagree for their use case).

Honestly, I think the default API should panic and having a new_checked(...) API which could use a Results would be ideal. But that's just me talking about distributions, perhaps it's worth discussing other API's as well; like if the beta function should return a Result? Personally, I would probably suggest against this, and would rather see a beta and beta_checked variant.

Provided that the naming scheme is hashed out now, it may not be necessary to do any work before a 1.0 release if the idea is to provide _checked variants on respective operations. Adding new variants wouldn't be backwards incompatible.

@boxtown
Copy link
Collaborator Author

boxtown commented May 10, 2017

@cbreeden I agree and personally am leaning towards a _checked API in addition to the current one as well. Hopefully it's something I can start looking into after 0.7.0 is released

@michiel-de-muynck
Copy link
Contributor

michiel-de-muynck commented Jun 1, 2017

We panic when an operation does not make mathematical sense (e.g. calculating the cumulative distribution function for discrete distributions at a negative input)

I agree that it's best to panic if an operation does not make mathematical sense. A beta-distribution is only defined for α > 0 and β > 0, so it's ok to panic if the user supplies α = -4.

However, the cumulative distribution function at the point x is defined (for all distributions, discrete or not) as the probability that a sample is less than (or equal to) x. This makes sense for all x, for all distributions. The cdf of a distribution that can only take positive values is 0 at all negative input values, and that's what statrs should return too. Similarly, the cdf of a Bernoulli distribution at any x > 1 is simply 1, and that's also what statrs should return. The function cdf(x) should never panic for any x and any distribution.

Similarly, the probability density function is also defined for any x and any distribution. If a distribution cannot produce the value x (or get arbitrarily close to it), then the probability density at x is 0. So pdf(x) should also never panic.

Even at infinity, the pdf and cdf are well-defined.: pdf(-∞) = 0, pdf(∞) = 0, cdf(-∞) = 0, cdf(∞) = 1 for all distributions.

The only exception is NaN. I think it's ok to panic when the input is NaN. However, standard mathematical functions like ln(x) don't panic on NaN either, instead they simply return NaN. So it might be better to follow that convention. Perhaps even best of all might be to do a bit of both: panic if the user supplies NaN for a parameter like α or β for the beta distribution, but return NaN if the user calculates cdf(NaN). That way is actually the easiest to implement, as you don't have to do anything. If there are any bounds checks, NaN will automatically fail those checks, but if there aren't and the function is composed of basic mathematical functions, then those will return NaN and so will you.

If you agree with these changes, I can implement them and submit a pull request.

@boxtown
Copy link
Collaborator Author

boxtown commented Jun 1, 2017

@migi Semantically you are correct and as long as we're consistent I have no issue accepting the changes.

michiel-de-muynck added a commit to michiel-de-muynck/statrs that referenced this issue Jun 2, 2017
See GitHub issue statrs-dev#41.

Don't panic when calculating a pdf, cdf or pmf whenever the input is a
value that the distribution cannot attain. The pdf, pmf and cdf are
still defined at those inputs and we should return the correct values.

This commit also adds a few tests to each distribution to test that
pdf(-inf)=0, pdf(inf)=0, cdf(-inf)=0 and cdf(inf)=1. It also uses simple
numerical integration to test for continuous distributions that the
integral of the pdf is approximately equal to the cdf, and for discrete
distributions that the sum of the pmf is equal to the cdf.
michiel-de-muynck added a commit to michiel-de-muynck/statrs that referenced this issue Jun 2, 2017
See GitHub issue statrs-dev#41.

Don't panic when calculating a pdf, cdf or pmf whenever the input is a
value that the distribution cannot attain. The pdf, pmf and cdf are
still defined at those inputs and we should return the correct values.

This commit also adds a few tests to each distribution to test that
pdf(-inf)=0, pdf(inf)=0, cdf(-inf)=0 and cdf(inf)=1. It also uses simple
numerical integration to test for continuous distributions that the
integral of the pdf is approximately equal to the cdf, and for discrete
distributions that the sum of the pmf is equal to the cdf.
@boxtown
Copy link
Collaborator Author

boxtown commented Jan 9, 2018

This has been resolved by #55 and #56

@boxtown boxtown closed this as completed Jan 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants