Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values below xmin are sliently trimmed when computing cdf or pdf #36

Open
anntzer opened this issue Jun 1, 2016 · 11 comments
Open

Values below xmin are sliently trimmed when computing cdf or pdf #36

anntzer opened this issue Jun 1, 2016 · 11 comments

Comments

@anntzer
Copy link

anntzer commented Jun 1, 2016

import numpy as np
import powerlaw
data = np.array([1.7, 3.2, 5.4, 7.9, 10., 12.])
results = powerlaw.Fit(data)
print(results.power_law.alpha)
print(results.power_law.xmin)
print(results.power_law.cdf(np.arange(10)))
print(results.power_law.cdf(np.arange(3, 10)))

xmin is ~2.28, and the last two lines give identical results: the entries 0, 1 and 2 of np.arange(10) have been silently trimmed out, when I'd expect them to return 1 (and 0 if computing the pdf). Otherwise, it becomes a bit awkward to e.g. do plt.plot(xs, results.power_law.cdf(xs)) if I am not directly computing xs using xmin.

@jeffalstott
Copy link
Owner

Thanks for using powerlaw!

Is there a problem with using powerlaw.plot_pdf(data_you_want_to_plot) or powerlaw.pdf(data_you_want_to_use)?

data = ...
results = powerlaw.Fit(data)
powerlaw.plot_pdf(data, label='Original Data')
results.power_law.plot_pdf(linestyle='..', label='Power Law Fit')

(The second line will plot a PDF that is normalized to the fitted data, not all the data, and thus will appear higher than the first PDF. This is awkward and people have talked about implementing an option in Fit.plot_pdf() and Fit.plot_cdf() to normalize the PDF to the value of the data, if it's something the user wants. If it's something you value, feel free to implement it and submit it as a pull request.)

@anntzer
Copy link
Author

anntzer commented Jun 1, 2016

IIRC, your parenthesized note is exactly why I decided to directly call pdf myself (so that I can perform the normalization).

@jeffalstott
Copy link
Owner

I'm confused. I don't understand what you're trying to do, and how the
functions we just described won't let you do it. Can you be more explicit?

On Wed, Jun 1, 2016 at 4:58 PM, Antony Lee [email protected] wrote:

IIRC, your parenthesized note is exactly why I decided to directly call
pdf myself (so that I can perform the normalization).


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#36 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA6_r9wic8p35yaz9Cr6l2zyDFjxv1Awks5qHUmqgaJpZM4IrRee
.

@anntzer
Copy link
Author

anntzer commented Jun 1, 2016

I want to plot the fitted pdf and have it (more or less) overlaid on top of the empirical pdf, i.e. the fitted pdf should be plotted normalized to the whole dataset. This is exactly the "awkwardness" you mention above.
So I instead called pdf myself and multiply the results by the correct factor before plotting.

@jeffalstott
Copy link
Owner

This was discussed way back when here, and it looks like @psinger even created a solution for the CDF/CCDF here. If you sort out the right thing to do for the PDF and implement it, I'll happily include it.

@anntzer
Copy link
Author

anntzer commented Jun 1, 2016

I'd rather fix the pdf/cdf/ccdf calculations themselves. Silently dropping data feels... wrong, doesn't it?

@jeffalstott
Copy link
Owner

I think there's a conceptual issue: Fit.pdf() and Fit.plot_pdf() plot an
equation. That equation has, as a very important component, a minimum
value. The PDF for that equation is 0 below the minimum value. There is
nothing coherent to plot for Fit.pdf() below xmin.

On Thu, Jun 2, 2016 at 1:48 AM, Antony Lee [email protected] wrote:

I'd rather fix the pdf/cdf/ccdf calculations themselves. Silently dropping
data feels... wrong, no?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#36 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA6_ryV12EHO6NZRmVWAi9IgsKeeNZ6Tks5qHcXxgaJpZM4IrRee
.

@anntzer
Copy link
Author

anntzer commented Jun 2, 2016

I fully agree with you on this point, but in my opinion this means that pdf should return zero for arguments below xmin, not drop them.

@jeffalstott
Copy link
Owner

Ah, I see! I suppose the question is whether it should return 0, or nan (or nothing, as it does now). What would be all the ways each of these could go wrong?

Again, though, this line of thinking seems to be for a use case (in the original comment) that's a bit weird: plt.plot(xs, results.power_law.cdf(xs)). Why not just results.power_law.plot_cdf()? Or plt.plot(results.power_law.cdf(xs))?

@anntzer
Copy link
Author

anntzer commented Jun 2, 2016

How would the last option (plt.plot(results.power_law.cdf(xs))) work? It won't be correctly aligned in x. The first approach (plot_cdf) would need scaling, which is a separate issue.

I don't actually really care whether you return 0 or nan. In fact even raising an exception would be fine with me (well, it would be not as good, but still OK). Silently dropping values, not so much.

@jeffalstott
Copy link
Owner

"How would the last option (plt.plot(results.power_law.cdf(xs))) work? It won't be correctly aligned in x. The first approach (plot_cdf) would need scaling, which is a separate issue."
Oh, I thought that Fit.power_law.cdf() (and Fit.power_law.pdf()) returned both y and x values, but in fact it only returns y values. Now I understand the source of your consternation!

"I don't actually really care whether you return 0 or nan. In fact even raising an exception would be fine with me (well, it would be not as good, but still OK). Silently dropping values, not so much."
I think nan is the more coherent option, but I'm not sure. Perhaps return nan by default, and give the user the option to return 0?

If you implement such functionality I'm happy to include it! The same goes for the scaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants