-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anisotropic KDEs via PCA #6
Comments
After applying the whitening transformation to the grid points, the grid is no longer axis aligned. I haven't checked |
@blasern : However, I believe that it does not matter, since the whitening transform is performed before a grid is indued. An axis-aligned grid is induced on the rotated space Here's a more verbose diagram, which commutes (up to approximations of data, etc):
(1) We have raw data, and no grid. Thoughts on this? PS. The diagrams are made using Textik. |
So I guess it depends on the If you use the second call to From my point of view, the purpose of a density estimator is to get density estimates in points of my choosing (not necessarily an axis aligned grid). Plotting these density estimates (your point 5) is secondary. Therefore I would prefer a solution that does not rely too heavily on the chosen grid. |
Very good points. I don't have all the answers, but some immediate thoughts:
By the way: it's great to discuss this with you. I see some potential and possibilities which would've eluded me if I was working alone. |
I have a feeling that we are doing this wrong. Do we have to rotate and rescale the data? It may be easier to rotate and rescale the kernels. When initializing a kernel we already provide a variance, why not a rotation? |
Here's an illustration of what I mean:
|
@blasern. Sorry for a very late reply. Here's what I am thinking. Starting from randomly generated data. #!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Dec 12 18:45:28 2018
@author: tommy
"""
import numpy as np
from numpy import linalg as la
import matplotlib.pyplot as plt
from KDEpy import FFTKDE
# --------- GENERATE DATA ---------
np.random.seed(123)
num_obs, num_dims = 40, 2
X = np.random.randn(num_obs, num_dims) + np.array([0.5, -0.8])
# Scale and rotate the data to make it more interesting
theta = np.pi/4
rotation = np.array([[np.cos(theta), np.sin(theta)],
[-np.sin(theta), np.cos(theta)]])
scaling = np.diag([3, 1])
X = X @ scaling @ rotation
# --------- COMPUTE THE SVD ---------
plt.figure(figsize=(5, 5)) # Prepare a figure
plt.scatter(X[:, 0], X[:, 1], label='Original data')
# we now perform singular value decomposition of X
# see https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca
U, s, Vt = la.svd(X, full_matrices=False)
V = Vt.T
# Rotate the data to principal component space
rotated_X = X @ V @ np.diag(1 / s) * np.sqrt(num_obs)
plt.scatter(rotated_X[:, 0], rotated_X[:, 1], color='red', s=3, label='Rotated')
# Rotate back, check that it works
rotated_back = rotated_X / np.sqrt(num_obs) @ np.diag(s) @ V.T
plt.scatter(rotated_back[:, 0], rotated_back[:, 1], color='black',
s=3, label='Rotated back')
plt.xlim([-5, 5]); plt.ylim([-5, 5]); plt.legend(); plt.show()
# --------- COMPUTE KERNEL DENSITY ESTIMATION ON ROTATED DATA ---------
grid_points = 2**7 # Grid points in each dimension
N, bw =10, 0.2 # Bandwidth in rotated space
# Compute the kernel density estimate
kde = FFTKDE(kernel='gaussian', norm=2, bw=bw)
grid, points = kde.fit(rotated_X).evaluate(grid_points)
# Prepare for plotting
fig, ax = plt.subplots(figsize=(5,5))
x, y = np.unique(grid[:, 0]), np.unique(grid[:, 1])
z = points.reshape(grid_points, grid_points).T
# Plot the kernel density estimate
ax.contour(x, y, z, N, linewidths=0.8, colors='k')
ax.contourf(x, y, z, N, cmap="RdBu_r")
ax.plot(rotated_X[:, 0].T, rotated_X[:, 1].T, 'ok', ms=3)
plt.xlim([-5, 5]); plt.ylim([-5, 5]) ;plt.show()
# --------- ROTATE THE GRID BACK ---------
# After rotation, the grid will not be alix-aligned
grid_rot = grid / np.sqrt(num_obs) @ np.linalg.inv(V @ np.diag(1/s))
# --------- RESAMPLE THE GRID ---------
# We pretend the data is the rotated grid, and the f(x, y) values are weights
# This is a re-sampling of the KDE onto an axis-aligned grid, and is needed
# since matplotlib requires axis-aligned grid for plotting.
# (The problem of plotting on an arbitrary grid is similar to density estimation)
kde = FFTKDE(kernel='gaussian', norm=2, bw=bw)
grid, points = kde.fit(grid_rot, weights=points).evaluate(grid_points)
# Create another figure
fig, ax = plt.subplots(figsize=(5, 5))
x, y = np.unique(grid[:, 0]), np.unique(grid[:, 1])
z = points.reshape(grid_points, grid_points).T
# Plot the kernel density estimate and the original data, for checking correctness
ax.contour(x, y, z, N, linewidths=0.8, colors='k')
ax.contourf(x, y, z, N, cmap="RdBu_r")
ax.plot(X[:, 0].T, X[:, 1].T, 'ok', ms=3)
plt.xlim([-5, 5]); plt.ylim([-5, 5]); plt.show() |
I approve of applying KDE on the output of a KDE. But I guess there are still some issues that need to be addressed:
So far I still have the feeling that rotating and rescaling the kernels may be the easier solution. |
@blasern These are valid issues. I have some thoughts, but it's easier to explain with a blackboard. I suggest we discuss this after Christmas :) |
Sounds good. Happy holidays! |
Alternatively, one can just evaluate the initial KDE on a non-equidistant grid -- a grid being transformed using the same PCA transformation applied on the data. Then a much more dense grid will help a lot. The authentic way to do this is, however, to carry out a bandwidth matrix computation, see https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation#Density_estimation_with_a_full_bandwidth_matrix, they mentioned a "data-driven" approach, which might well be slow and not exactly what we want here... I am more thinking of carrying out the same approach Botev did, but for 2d data -- the proof shouldn't be all that difficult, considering that we are blessed with the fact that all norms are equivalent in finite-dimensional space. |
Currently, the bandwidth in
d
dimensions isbw * np.eye(d)
---the covariance matrix is a multiple of the identity. As a result, the KDE works best if anisotropic data is shifted, rotatated and scaled before sent into the algorithm.Having a routine for this built into the library would be nice.
@blasern suggested implementing anisotropic KDE. We agree that the following might work:
f
.g
.f^{-1}(g^{-1}(D))
, make sure to scale the volume to preserve an integral of unity (determinant of the transformation).The above would implement very general KDEs in arbitrary dimensions.
The text was updated successfully, but these errors were encountered: