Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing cluster_labels broken #49

Closed
mdruiter opened this issue Mar 2, 2023 · 4 comments · Fixed by #63
Closed

Passing cluster_labels broken #49

mdruiter opened this issue Mar 2, 2023 · 4 comments · Fixed by #63
Assignees
Labels
bug Something isn't working review This issue is under review and will (hopefully) be closed soon

Comments

@mdruiter
Copy link

mdruiter commented Mar 2, 2023

I think I have found a bug that occurs when passing some cluster_labels.

When I completely reverse the order of all input (data and cluster_labels), and I reverse the result (local_outlier_probabilities), I would expect the same numbers. This does happen as long as all cluster_labels values are equal. Once I have two (really separate) clusters, the results change when flipped!
An extra indication that things go wrong (IMHO): the second cluster's neighbor numbers are in the first cluster!

A small reproduction example:

import matplotlib.pyplot as plt
from PyNomaly import loop

np.random.seed(1)
n = 9
data = np.append(np.random.normal(2, 1, [n, 2]), np.random.normal(8, 1, [n, 2]), axis=0)
clus = np.append(np.ones(n),                     2 * np.ones(n)).tolist()  # 2 cluster numbers!
model = loop.LocalOutlierProbability(data, n_neighbors=5, cluster_labels=clus)
fit = model.fit()
res = fit.local_outlier_probabilities
print(res)
print(fit.neighbor_matrix)

data_flipped = np.flipud(data)
clus_flipped = np.flipud(clus).tolist()
model2 = loop.LocalOutlierProbability(data_flipped, n_neighbors=5, cluster_labels=clus_flipped)
fit2 = model2.fit()
res2 = np.flipud(fit2.local_outlier_probabilities)
print(res2)
print(np.flipud(fit2.neighbor_matrix))

s  = 1 + 100 * res.astype(float)
s2 = 1 + 100 * res2.astype(float)
plt.scatter(data[:, 0], data[:, 1], c=clus, s=s,  marker='+')
plt.scatter(data[:, 0], data[:, 1], c=clus, s=s2, marker='x')
plt.show()

@mdruiter
Copy link
Author

mdruiter commented Mar 6, 2023

The problem is in the 'definition' of neighbor_matrix: _compute_distance_and_neighbor_matrix returns indexes within the cluster, but _prob_distances_ev treats the numbers as being global.

@vc1492a
Copy link
Owner

vc1492a commented Mar 20, 2023

Hey @mdruiter - thanks for noting the issue and where it is occurring.

Are you able to submit a fix in a pull request?

@vc1492a vc1492a self-assigned this Mar 20, 2023
@vc1492a vc1492a added the bug Something isn't working label Mar 20, 2023
@vc1492a vc1492a added this to the Address Existing Bug Fixes milestone Aug 19, 2024
@vc1492a vc1492a assigned IroNEDR and unassigned vc1492a Aug 19, 2024
@IroNEDR IroNEDR added the in progress This issue is being actively worked on label Aug 25, 2024
@vc1492a vc1492a assigned vc1492a and unassigned IroNEDR Sep 30, 2024
@vc1492a
Copy link
Owner

vc1492a commented Oct 11, 2024

Will be covered in branch 49-passing-cluster_labels-broken.

@vc1492a vc1492a linked a pull request Oct 11, 2024 that will close this issue
@vc1492a vc1492a added review This issue is under review and will (hopefully) be closed soon and removed in progress This issue is being actively worked on labels Oct 11, 2024
@vc1492a
Copy link
Owner

vc1492a commented Oct 18, 2024

This was covered in #63 and will be pushed in a subsequent release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working review This issue is under review and will (hopefully) be closed soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants