Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contour plot levels corresponding to probability or std #3

Open
hageldave opened this issue Feb 6, 2024 · 3 comments
Open

Contour plot levels corresponding to probability or std #3

hageldave opened this issue Feb 6, 2024 · 3 comments

Comments

@hageldave
Copy link
Collaborator

Currently the levels of the contour plot that show pdfs are automatically chosen, but it is unclear what the levels correspond to in terms of probability.

image

I suggest that the isovalues (densities) of the isolines should be selected so they show highest density regions (see https://doi.org/10.2307/2684423). Then areas like "75% of samples fall into this region" can be shown.

Here's a 1D illustration for highest density regions (HDR) that I stole from some stackoverflow thread.
image.

To find the corresponding density values, a monte carlo approach was proposed here, which does the following:

  1. input: random variable X with pdf f(x)
  2. S ↤ draw n samples from X
  3. Df(S) (compute densities for the samples),
  4. sort D in descending order so that Di is the i-th largest density
  5. Use Di as an approximation for the isovalue corresponding to i/n-quantile of Y = f(X)

So for a region of "75% of samples are in here" we use Dj with j=int(0.75*n)

I'm proposing this approach because it works with any distribution that allows for sampling and has a pdf (we need a pdf anyway to draw contours). However, it requires to draw many samples to get good estimates. For example 99.7% (=3 standard deviation radius of normal distribution) needs at least 1000 samples to give a bad estimate, rather 10.000 for an okay-ish estimate. I'm not sure if this is the best we can do, but its a simple algorithm.

@marinaevers
Copy link
Collaborator

Is this issue still open?

@hageldave
Copy link
Collaborator Author

hageldave commented Jul 25, 2024

It was implemented in 79f5d02.
However, there is a little detail missing:
The number of samples that are required for estimating the isovalues depends on the requested largest quantile. E.g. 99.7% (currently default largest) requires at least 997 samples (least dense sample is picked), but for a good estimate 10.000 samples would be more reasonable.

I think we should add a mechanism to determine a reasonable number of samples dependent on the requested quantiles.
Also I think the default quantiles (68%, 95%, 99.7%) should be changed to smaller ones, e.g. 25%, 75%, 95%, or maybe 50%, 90%, 99%, to require less samples and density calculations. But I'm not sure what quantiles to choose and why.
The current choice is motivated by 1, 2, 3 stdv of the normal distribution.

@marinaevers
Copy link
Collaborator

I agree for general distributions. For normal distributions, which are the most common case, one could discuss avoiding sampling and using the cdf directly.
As the quantiles converge very slowly with the number of Monte Carlo samples (the pdf values goes in as a prefactor for the convergence), I would prefer 25%, 75%, and 95% to keep the largest one as small as possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants