Values below xmin are sliently trimmed when computing cdf or pdf #36

anntzer · 2016-06-01T07:46:56Z

import numpy as np
import powerlaw
data = np.array([1.7, 3.2, 5.4, 7.9, 10., 12.])
results = powerlaw.Fit(data)
print(results.power_law.alpha)
print(results.power_law.xmin)
print(results.power_law.cdf(np.arange(10)))
print(results.power_law.cdf(np.arange(3, 10)))

xmin is ~2.28, and the last two lines give identical results: the entries 0, 1 and 2 of np.arange(10) have been silently trimmed out, when I'd expect them to return 1 (and 0 if computing the pdf). Otherwise, it becomes a bit awkward to e.g. do plt.plot(xs, results.power_law.cdf(xs)) if I am not directly computing xs using xmin.

The text was updated successfully, but these errors were encountered:

jeffalstott · 2016-06-01T08:04:39Z

Thanks for using powerlaw!

Is there a problem with using powerlaw.plot_pdf(data_you_want_to_plot) or powerlaw.pdf(data_you_want_to_use)?

data = ...
results = powerlaw.Fit(data)
powerlaw.plot_pdf(data, label='Original Data')
results.power_law.plot_pdf(linestyle='..', label='Power Law Fit')

(The second line will plot a PDF that is normalized to the fitted data, not all the data, and thus will appear higher than the first PDF. This is awkward and people have talked about implementing an option in Fit.plot_pdf() and Fit.plot_cdf() to normalize the PDF to the value of the data, if it's something the user wants. If it's something you value, feel free to implement it and submit it as a pull request.)

anntzer · 2016-06-01T08:58:17Z

IIRC, your parenthesized note is exactly why I decided to directly call pdf myself (so that I can perform the normalization).

jeffalstott · 2016-06-01T09:06:19Z

I'm confused. I don't understand what you're trying to do, and how the
functions we just described won't let you do it. Can you be more explicit?

On Wed, Jun 1, 2016 at 4:58 PM, Antony Lee [email protected] wrote:

IIRC, your parenthesized note is exactly why I decided to directly call
pdf myself (so that I can perform the normalization).

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#36 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA6_r9wic8p35yaz9Cr6l2zyDFjxv1Awks5qHUmqgaJpZM4IrRee
.

anntzer · 2016-06-01T09:10:12Z

I want to plot the fitted pdf and have it (more or less) overlaid on top of the empirical pdf, i.e. the fitted pdf should be plotted normalized to the whole dataset. This is exactly the "awkwardness" you mention above.
So I instead called pdf myself and multiply the results by the correct factor before plotting.

jeffalstott · 2016-06-01T09:26:01Z

This was discussed way back when here, and it looks like @psinger even created a solution for the CDF/CCDF here. If you sort out the right thing to do for the PDF and implement it, I'll happily include it.

anntzer · 2016-06-01T17:48:32Z

I'd rather fix the pdf/cdf/ccdf calculations themselves. Silently dropping data feels... wrong, doesn't it?

jeffalstott · 2016-06-02T01:24:26Z

I think there's a conceptual issue: Fit.pdf() and Fit.plot_pdf() plot an
equation. That equation has, as a very important component, a minimum
value. The PDF for that equation is 0 below the minimum value. There is
nothing coherent to plot for Fit.pdf() below xmin.

On Thu, Jun 2, 2016 at 1:48 AM, Antony Lee [email protected] wrote:

I'd rather fix the pdf/cdf/ccdf calculations themselves. Silently dropping
data feels... wrong, no?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#36 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AA6_ryV12EHO6NZRmVWAi9IgsKeeNZ6Tks5qHcXxgaJpZM4IrRee
.

anntzer · 2016-06-02T01:52:41Z

I fully agree with you on this point, but in my opinion this means that pdf should return zero for arguments below xmin, not drop them.

jeffalstott · 2016-06-02T02:00:26Z

Ah, I see! I suppose the question is whether it should return 0, or nan (or nothing, as it does now). What would be all the ways each of these could go wrong?

Again, though, this line of thinking seems to be for a use case (in the original comment) that's a bit weird: plt.plot(xs, results.power_law.cdf(xs)). Why not just results.power_law.plot_cdf()? Or plt.plot(results.power_law.cdf(xs))?

anntzer · 2016-06-02T02:57:02Z

How would the last option (plt.plot(results.power_law.cdf(xs))) work? It won't be correctly aligned in x. The first approach (plot_cdf) would need scaling, which is a separate issue.

I don't actually really care whether you return 0 or nan. In fact even raising an exception would be fine with me (well, it would be not as good, but still OK). Silently dropping values, not so much.

jeffalstott · 2016-06-02T03:14:31Z

"How would the last option (plt.plot(results.power_law.cdf(xs))) work? It won't be correctly aligned in x. The first approach (plot_cdf) would need scaling, which is a separate issue."
Oh, I thought that Fit.power_law.cdf() (and Fit.power_law.pdf()) returned both y and x values, but in fact it only returns y values. Now I understand the source of your consternation!

"I don't actually really care whether you return 0 or nan. In fact even raising an exception would be fine with me (well, it would be not as good, but still OK). Silently dropping values, not so much."
I think nan is the more coherent option, but I'm not sure. Perhaps return nan by default, and give the user the option to return 0?

If you implement such functionality I'm happy to include it! The same goes for the scaling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Values below xmin are sliently trimmed when computing cdf or pdf #36

Values below xmin are sliently trimmed when computing cdf or pdf #36

anntzer commented Jun 1, 2016

jeffalstott commented Jun 1, 2016

anntzer commented Jun 1, 2016

jeffalstott commented Jun 1, 2016

anntzer commented Jun 1, 2016

jeffalstott commented Jun 1, 2016

anntzer commented Jun 1, 2016 •

edited

Loading

jeffalstott commented Jun 2, 2016

anntzer commented Jun 2, 2016

jeffalstott commented Jun 2, 2016

anntzer commented Jun 2, 2016

jeffalstott commented Jun 2, 2016

Values below xmin are sliently trimmed when computing cdf or pdf #36

Values below xmin are sliently trimmed when computing cdf or pdf #36

Comments

anntzer commented Jun 1, 2016

jeffalstott commented Jun 1, 2016

anntzer commented Jun 1, 2016

jeffalstott commented Jun 1, 2016

anntzer commented Jun 1, 2016

jeffalstott commented Jun 1, 2016

anntzer commented Jun 1, 2016 • edited Loading

jeffalstott commented Jun 2, 2016

anntzer commented Jun 2, 2016

jeffalstott commented Jun 2, 2016

anntzer commented Jun 2, 2016

jeffalstott commented Jun 2, 2016

anntzer commented Jun 1, 2016 •

edited

Loading