Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable bandwidth for 3 dimensional data. #26

Open
ytarricq opened this issue Jul 19, 2019 · 6 comments
Open

Variable bandwidth for 3 dimensional data. #26

ytarricq opened this issue Jul 19, 2019 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ytarricq
Copy link

Hello,

First of all, thanks for the great package.
I'm trying to compute density maps of a 3 dimensional points distribution. I understood from the documentation that a variable bandwith method was available but I couldn't figure out how to set up this option.
Additionnaly, in the case of a fixed bandwidth KDE for multidimensional data, I would have expected as in the stats_models_multivariateKDE implementation to be able to use a bandwidth per dimension but it seems that we can either use a single value of the bandwidth or to use one bandwidth per data point. Is it in order to take into account the weight of each data point that you implemented it this way ?

Thanks in advance.

Cheers
Yoann

@tommyod
Copy link
Owner

tommyod commented Jul 19, 2019

Thanks for the kind words, and for raising this issue @ytarricq .

  • Variable bandwidth (i.e. a unique bandwidth per data point) is only available in the NaiveKDE and TreeKDE implementations. You have to supply an array as the bw parameter, see the docs here.
  • If you want to use a bandwidth matrix that depends on the dimensions, i.e. bandwidth 2 in the x direction and bandwidth 3 in the y dimension, that's not supported directly. The reason is that every kernel is implement as a radial basis function. Scipy supports arbitrary bandwidth matrices, which is easy for Gaussian kernels. KDEpy supports arbitrary kernels, which makes this tougher. There's an elegant workaround to this problem: use the SVD to transform the data (scale, rotate) instead, see my recipe here.

Hope this helps you. 👍

Making that recipe idiot-proof and implementing it in the main library would be a good task. If you (or anyone reading this) is up for it, that's a PR I would merge.

@ytarricq
Copy link
Author

Thanks for the quick answer ! Made things clearer between the bandwidth matrices/variable bandwidths.
I will work on the best way to handle my data and will get back at you if I'm successfull.

@philippeller
Copy link

I'm having the same issue with the fixed bandwidth for all dimensions.
In my case one dimension has a radically different scale than the others and hence the resulting KDEs don't look good. Just scaling the data in that dimension works fine (and rescaling after KDE), since I don't need any rotation/covariance.
Wouldn't implementing a bw per dimensions, if given as an iterable, not get us a long way without complicating things too much?

@tommyod
Copy link
Owner

tommyod commented Mar 24, 2020

@philippeller : Since the kernel functions are radial basis functions, I suppose your suggestion would amount to scaling the input data in each dimension, computing the KDE, then scaling back. However, it would hide how the data is scaled from the user. Some options are: min/max scaling, standardizing with the standard deviation and the mean, quantile transformations, etc.

I feel that "simple is better than complex" and the "principle of least surprise" applies here. Doing some implicit scaling scheme might confuse users more than it helps them. Stating that "the multidimensional KDE is isotropic" and letting users handle scaling seems simpler to understand and less likely to produce unexpected results.

I'm open to suggestions of course. But I would need some details. A high-level wrapper function, or a ScalingTransformer class might be sensible.

@philippeller
Copy link

philippeller commented Mar 24, 2020

Maybe I'm missing some important point, but I was thinking not an implicit, but an explicit scaling.

Let's say the user supplies two dimensional data (x and y) and the fixed bandwidths as (bw_x, bw_y).
Now internally you compute scaled_x = x / bw_x( and scaled_y = y / bw_y), then proceed with the KDE on the scaled data using bandwidth = 1, and in the end just undo the scaling, wouldn't that work?

@tommyod
Copy link
Owner

tommyod commented Mar 24, 2020

That would work. 👍 Thanks for clarifying. I got a little ahead of myself.

What you're sketching might be worth implementing. In a different issue #6 we had some discussions about a more general case. It's really an issue of API design. The way I see it:

Pros:

  • Useful in some use cases, saves the users some time (but they can do it themselves too)

Cons:

  • Extends the current API a little (but backwards compatible, so no big deal)
  • Doesn't implement the more general case (general anisotropic KDEs via rotations)

In conclusion I would merge a PR that implements this. 👍 No promises about when/if I'll find time to do it myself though.

@tommyod tommyod added help wanted Extra attention is needed good first issue Good for newcomers labels Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants