Skip to content

Spectral clustering implementation for distributed machine learning

License

Notifications You must be signed in to change notification settings

ml4py/speclus4py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

speclus4py

Spectral clustering implementation for distributed machine learning

Latest anaconda-cloud version Platforms

This package is related to unsupervised learning using the spectral clustering technique. It is written in the Python programming language on top of the SLEPc and PETSc/TAO frameworks, and it includes many others additional packages like OpenCV, scipy, numpy, numba. It takes advantages from distributed memory management, which basically inherits from PETSc. Thus, computations can run on parallel computing architectures such as beowulfs/small clusters and supercomputers natively. It does not mean that one cannot use it on laptops and desktop computers. On these, users can effectively utilize computational cores using the message passing interface commonly known as MPI as well. Used approach based on MPI provides distributed data parallelism to the conventional machine learning technique and enables it for processing large-scale data.

This repository contains the alpha pre-release of the speclus4py package. Do not hesitate to pull a request if you find a bug or have an idea how to extend a package functionality.

Installation

You can simply install spelus4py using the package management systems Anaconda or Conda.

conda create -n env-speclus4py python=3.7
conda activate env-speclus4py
conda config --add channels conda-forge
conda config --append channels conda-forge/label/gcc7
conda install -c ml4py speclus4py

Getting Started

Take a look at Usage and the examples located in the demo/ folder to your first meeting with this package, which might accelerate using spelus4py in your research. You can run the example of 2-phase segmentation of data/vol_imgs/ball.vti (volumetric image) by simply typing to a system console a following command that runs a computation on two CPU cores:

export WORK_DIR=$PWD
mpirun -np 2 python demo/ex1.py 

If this package was succesfully installed, you can see the result as a green ball displayed in a visualization window.

Usage

The best way to see how speclus4py can help to your research is looking at the demo folder.

from mpi4py import MPI

from speclus4py import SPECLUS as clustering
from speclus4py.tools import hdist

comm = MPI.COMM_WORLD
solver = clustering.solver(comm=comm, verbose=True)

filename = 'data/vol_imgs/ball.vti'

solver.filename_input = filename
# if similarity function is not defined, similarity based on the RBF function is used and
# similarity parameter corresponds to standard deviation related to RBF
solver.fn_similarity_params = 0.05 
# number of nearest neighbours to those similarities are being computed
solver.connectivity = 6

solver.setFromOptions()

# Determine the Hamming distance between solution and ground truth
if comm.Get_rank() == 0:
    labels = solver.getLabels()
    if labels is not None:
        hdist.hdist_2phase_vol_img(filename, labels, verbose=True)

Publications

  • Pecha, Marek (2021): General Technique for Estimating Number of Groups for Spectral Clustering. TechRxiv. Preprint. 10.36227/techrxiv.13553705

Acknowledgements

This software can be developed thanks to the financial support of The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project IT4Innovations excellence in science no. LQ1602, the programme for supporting for science and research in the Moravia–Silesia Region 2017 no. RRC/10/2017, the institutional development plan project RPP2020/138, and Grants of SGS (VSB-TUO) no. SP2020/84 and SP2020/114.

Volumetric images included in the distribution of this package were provided by colleagues from the Institute for Parallel Processing, Bulgarian Academy of Science. The main functionality of this package is programmed in cooperation with VSB - Technical University of Ostrava and Czech Academy of Sciences (Institute of Geonics).

Releases

No releases published

Packages

No packages published

Languages