Contour plot levels corresponding to probability or std #3

hageldave · 2024-02-06T12:12:39Z

Currently the levels of the contour plot that show pdfs are automatically chosen, but it is unclear what the levels correspond to in terms of probability.

I suggest that the isovalues (densities) of the isolines should be selected so they show highest density regions (see https://doi.org/10.2307/2684423). Then areas like "75% of samples fall into this region" can be shown.

Here's a 1D illustration for highest density regions (HDR) that I stole from some stackoverflow thread.
.

To find the corresponding density values, a monte carlo approach was proposed here, which does the following:

input: random variable X with pdf f(x)
S ↤ draw n samples from X
D ↤ f(S) (compute densities for the samples),
sort D in descending order so that Di is the i-th largest density
Use Di as an approximation for the isovalue corresponding to i/n-quantile of Y = f(X)

So for a region of "75% of samples are in here" we use Dj with j=int(0.75*n)

I'm proposing this approach because it works with any distribution that allows for sampling and has a pdf (we need a pdf anyway to draw contours). However, it requires to draw many samples to get good estimates. For example 99.7% (=3 standard deviation radius of normal distribution) needs at least 1000 samples to give a bad estimate, rather 10.000 for an okay-ish estimate. I'm not sure if this is the best we can do, but its a simple algorithm.

The text was updated successfully, but these errors were encountered:

marinaevers · 2024-07-25T12:55:27Z

Is this issue still open?

hageldave · 2024-07-25T13:49:47Z

It was implemented in 79f5d02.
However, there is a little detail missing:
The number of samples that are required for estimating the isovalues depends on the requested largest quantile. E.g. 99.7% (currently default largest) requires at least 997 samples (least dense sample is picked), but for a good estimate 10.000 samples would be more reasonable.

I think we should add a mechanism to determine a reasonable number of samples dependent on the requested quantiles.
Also I think the default quantiles (68%, 95%, 99.7%) should be changed to smaller ones, e.g. 25%, 75%, 95%, or maybe 50%, 90%, 99%, to require less samples and density calculations. But I'm not sure what quantiles to choose and why.
The current choice is motivated by 1, 2, 3 stdv of the normal distribution.

marinaevers · 2024-07-25T20:16:01Z

I agree for general distributions. For normal distributions, which are the most common case, one could discuss avoiding sampling and using the cdf directly.
As the quantiles converge very slowly with the number of Monte Carlo samples (the pdf values goes in as a prefactor for the convergence), I would prefer 25%, 75%, and 95% to keep the largest one as small as possible

nikhilbhavikatti mentioned this issue Jun 24, 2024

Contour plot, Contour band and Dot plot updates #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contour plot levels corresponding to probability or std #3

Contour plot levels corresponding to probability or std #3

hageldave commented Feb 6, 2024

marinaevers commented Jul 25, 2024

hageldave commented Jul 25, 2024 •

edited

Loading

marinaevers commented Jul 25, 2024

Contour plot levels corresponding to probability or std #3

Contour plot levels corresponding to probability or std #3

Comments

hageldave commented Feb 6, 2024

marinaevers commented Jul 25, 2024

hageldave commented Jul 25, 2024 • edited Loading

marinaevers commented Jul 25, 2024

hageldave commented Jul 25, 2024 •

edited

Loading