Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/neg binomial dist #627

Merged

Conversation

dawidkopczyk
Copy link
Contributor

@dawidkopczyk dawidkopczyk commented Apr 3, 2023

This PR introduces Negative Binomial distribution partially answering request from #569.

Implementation details:

@MarcAntoineSchmidtQC
Copy link
Member

Thank you for your contribution @dawidkopczyk. I'm reviewing the code. While doing that, I'm going to add a new benchmark problem so that we can confirm that we are converging to a similar solution to existing software (h2o, or statsmodels for unregularized models). I'll keep you posted.

Copy link
Member

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I've been able to add the negative binomial case to our benchmarking suite and compare it against h2o. We are converging to the same solution. In terms of performance, we are still ahead by a nice margin.

Happy to merge this very soon.

upper_bound = np.Inf
include_upper_bound = False

def __init__(self, theta=1.0):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the default value of 1 sensible? h2o defaults to 1e-10, which is pretty different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we consider target y in negative binomial as a number of failures before 1 / theta -th success, we would have:

  • for theta = 1 case we model the number of failures before 1st success
  • for theta = 1e-10 case we model the number of failures before -th success, so generally it is a count of failures

On the other hand for theta = 1e-10 the variance is just mean (like in Poisson) whereas for theta = 1 the variance seems like binomial distribution.

I don't have a strong feeling about which default should be used, took 1.0 since 1e-10 seemed for me too close to disallowed 0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with 1 then. I saw that h2o uses 0.5 in their example.

def theta(self, theta):

if not isinstance(theta, (int, float)):
raise TypeError(f"theta must be an int or float, input was {theta}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is directly reused from the Tweedie distribution, so not a problem with this PR exactly. However, I think we should have a different approach for type-checking here. The problem is that int and float are too restrictive. The best demonstration of this is that this would not work:

dist = glum.NegativeBinomialDistribution()
dist.theta = dist.theta  # This fails.

The failure is due to the fact that we are returning a float32 and we are checking the input to be of type float (aka float64)

The simplest approach here would be to ask for forgiveness. Wrap line 1168 into a try except block and delegate the type-checking task to numpy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just let me know whether this should be solved in this PR and what is your suggested change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's solve this in a different PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened issue #631

y, mu, sample_weight = _as_float_arrays(y, mu, sample_weight)
sample_weight = np.ones_like(y) if sample_weight is None else sample_weight

return negative_binomial_deviance(y, sample_weight, mu, theta=float(theta))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the setter decorator on theta, we are sure we are always getting a float32. I don't think the conversion to a float is necessary here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure it works with the cython typing, the theta parameter can be set to be a C float (which is 32-bits), instead of floating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since p and dispersion arguments are passed as floating, would it be better to change that for all cases in a separate PR? I am not feeling comfortable to change it only for neg bin dist.

)

@property
def lower_bound(self) -> float:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid confusion, can you store lower_bound with upper_bound above as a simple class attribute? As far as I know, we won't need to do any computation here.

Same comment for include_lower_bound.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good point

)

def log_likelihood(self, y, mu, sample_weight=None, dispersion=1) -> float:
r"""Compute the log likelihood.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a link to a source for the actual formula you used in the negative_binomial_log_likelihood. The formula is different from what is listed on h2o, statsmodels, and Wikipedia (the sources you listed in the PR description.

I'm very confident that it is correct, but for future reference it would be easier not to make the derivation again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the same (although theta is called alpha) formula is presented in Eq (3) here https://content.wolfram.com/uploads/sites/19/2013/04/Zwilling.pdf so I will add as References in class.

@@ -27,6 +28,7 @@
(GammaDistribution(), 0),
(InverseGaussianDistribution(), 0),
(TweedieDistribution(power=1.5), 0),
(NegativeBinomialDistribution(theta=1.0), 0),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should test a value other than 1. theta = 1 leads to a lot of simplifications in the formulas that may hide bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will put 1.5 here

Copy link
Member

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution @dawidkopczyk. This looks very good.

@MarcAntoineSchmidtQC MarcAntoineSchmidtQC merged commit 894669e into Quantco:main Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants