Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupervised Algorithms and Metrics #414

Open
brifordwylie opened this issue Feb 6, 2024 · 0 comments
Open

Unsupervised Algorithms and Metrics #414

brifordwylie opened this issue Feb 6, 2024 · 0 comments
Assignees
Labels
algorithm SageWorks Algorithms research Research and Development
Milestone

Comments

@brifordwylie
Copy link
Member

๐ญ ๐ก๐จ๐ฐ ๐๐จ ๐ฐ๐ž ๐ค๐ง๐จ๐ฐ ๐ฐ๐ก๐ข๐œ๐ก ๐ฆ๐ž๐ญ๐ก๐จ๐ ๐ข๐ฌ ๐›๐ž๐ญ๐ญ๐ž๐ซ? ๐–๐ž ๐๐จ๐งโ€™๐ญ ๐ก๐š๐ฏ๐ž ๐ฅ๐š๐›๐ž๐ฅ๐ฌ ๐ข๐ง ๐”๐ง๐ฌ๐ฎ๐ฉ๐ž๐ซ๐ฏ๐ข๐ฌ๐ž๐ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐ , ๐๐จ ๐ ๐ซ๐จ๐ฎ๐ง๐ ๐ญ๐ซ๐ฎ๐ญ๐ก.

The answer lies in using evaluation metrics that can help us determine the quality of our algorithm.

๐”ผ๐•ง๐•’๐•๐•ฆ๐•’๐•ฅ๐•š๐• ๐•Ÿ ๐•„๐•–๐•ฅ๐•™๐• ๐••๐•ค:

โžŠ Silhouette score:

A high Silhouette score (close to 1) indicates that data points within clusters are similar, and that the normal data points are well separated from the anomalous ones.

โž‹ Calinski-Harabasz index:

Calinski-Harabasz Index measures the between-cluster dispersion against within-cluster dispersion. A higher score signifies better-defined clusters.

โžŒ Davies-Bouldin index:

Davies-Bouldin Index measures the size of clusters against the average distance between clusters. A lower score signifies better-defined clusters.

โž Kolmogorov-Smirnov statistic:

It measures the maximum difference between the cumulative distribution functions of the normal and anomalous data points.

โžŽ Precision at top-k:

The metric calculates the precision of the top-k anomalous data points using expert domain knowledge.

https://towardsdatascience.com/7-evaluation-metrics-for-clustering-algorithms-bdc537ff54d2

https://towardsdatascience.com/three-performance-evaluation-metrics-of-clustering-when-ground-truth-labels-are-not-available-ee08cb3ff4fb

https://medium.datadriveninvestor.com/evaluation-metrics-for-clustering-96dcdbea437d

https://towardsdatascience.com/a-comprehensive-beginners-guide-to-the-diverse-field-of-anomaly-detection-8c818d153995

@brifordwylie brifordwylie added algorithm SageWorks Algorithms research Research and Development labels Feb 6, 2024
@brifordwylie brifordwylie added this to the SageWorks: 0.7.0 milestone Feb 6, 2024
@brifordwylie brifordwylie self-assigned this Feb 6, 2024
@brifordwylie brifordwylie modified the milestones: SageWorks: 0.9.0, Future Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algorithm SageWorks Algorithms research Research and Development
Projects
Status: Backlog
Development

No branches or pull requests

1 participant