Features/1674 optimization of k means initialization #1754

Hakdag97 · 2024-12-17T13:03:27Z

Description

The bottleneck of k-means clustering (concerning runtime) is the initialization of centroids, which was previously built on a cost intensive serial algorithm. The aim of this pull request is to replace this algorithm by the more sophisticated k-means || initialization of centroids.

Issue/s resolved:

Changes proposed:

Appropriate handling of an additional edge case in the function nonzero
Complete new implementation of the initialization of centroids used for k-means, k-medians, and k-medoids
Adjustment of classes (like KMeans) to match with the new implementation

Type of change

Bug fix
New feature

Performance

Reducing the runtime of initialization of clustering algorithm in distributed and non-distributed mode with split=None and split not None by (at least) an order of magnitude (depending on the setting concerning, e.g., size of data and chosen parameters)

Does this change modify the behaviour of other functions? If so, which?

yes: the classes KMeans, KMedoids, KMedians and the function where are affected

github-actions · 2024-12-17T13:09:26Z

Thank you for the PR!

Hakdag97 and others added 3 commits November 6, 2024 14:00

Created a test file mytest.py

ae10038

Implementation of parallel initialization

3b62d4f

Refined comments for better readability

0889330

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/1674 optimization of k means initialization #1754

Features/1674 optimization of k means initialization #1754

Hakdag97 commented Dec 17, 2024 •

edited

Loading

github-actions bot commented Dec 17, 2024

Features/1674 optimization of k means initialization #1754

Are you sure you want to change the base?

Features/1674 optimization of k means initialization #1754

Conversation

Hakdag97 commented Dec 17, 2024 • edited Loading

Description

Changes proposed:

Type of change

Performance

Does this change modify the behaviour of other functions? If so, which?

github-actions bot commented Dec 17, 2024

Hakdag97 commented Dec 17, 2024 •

edited

Loading