All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning.
- k-means refinement now reports inertia computed for all k values
- the parameters that can be used to configure k-means runs have been modified providing more flexibility
- k-means refinement now supports user-specified features
- minor naming changes in classes/parameters (kmeans vs kmean, and consistent capitalization)
- result objects are now copied on return, so running more iterations does not modify previous results
- initial cluster assignment can now be provided as dask array
- Fix mem_estimate_coclustering_numpy on Windows: default int to 32 bit could easily overflow (#82).
- Test instructions have been updated, dropping the deprecated use of setuptools' test (#80)
- Docs improvements (#78 and #79)
- Fixing README - to be used as long_description on PyPI
- k-means refinement also return refined-cluster labels
- Fixed bug in calculate_cluster_features, affecting kmeans and the calculation of the tri-cluster averages for particular ordering of the dimensions
- Number of converged runs in tri-cluster is updated
- Numerical parameter epsilon is removed, which should lead to some improvement in the algorithm when empty clusters are present
- The refined cluster averages are not computed anymore over co-/tri-cluster averages but over all corresponding elements
- Dropped non-Numba powered low-mem version of co-clustering
- k-means implementation for tri-clustering
- utility functions to calculate cluster-based averages for tri-clustering
- Best k value in k-means is now selected automatically using the Silhouette score
- utility function to estimate memory peak for numpy-based coclustering
- utility function to calculate cluster-based averages
- added Dask-based tri-clustering implementation
- k-means setup is more robust with respect to setting the range of k values and the threshold on the variance
- calculation of k-means statistics is faster
- new version of tri-clustering algorithm implemented, old version moved to legacy folder
- Reduced memory footprint of low-memory Dask-based implementation
- Fixed error handling in high-performance Dask implementation
- Dropped tests on Python 3.6, added tests for Python 3.9 (following Dask)
- Solve dependency issue: fail to install requirements with pip
- Low-memory version for numpy-based coclustering, significantly reducing the memory footprint of the code
- Numba-accelerated version of the low-memory version of the numpy-based co-clustering
- Results objects include input_parameters dictionary and other metadata
- Solve issue in increasingly large Dask graph for increasing iterations
- Main calculator classes stores results in dedicated object
- Cluster results of co-/tri-clustring are now serialized to a file
- Improved output
- Bug fix in selecting minimum error run in co- and tri-clustering
- K-means now loop over multiple k-values
- First version of the CGC package, including minimal docs and tests