Bachelor Thesis Simulations

This repository contains the project files written by me that I used to perform the simulations for my bachelor thesis "Initialization of the k-means algorithm - A comparison of three methods". The thesis was evaluated and approved in January of 2023 and is available through the Mathematical Institution at Stockholm University, and also here in this repository.

Summary of thesis

The topic of my thesis may seem complicated and hard to grasp, so here comes a simple explanation. There is an algorithm called the k-means algorithm. The intended purpose of it is to find groups of datapoints, like in the image below which shows length and width measurements of potatoes and carrots respectively. The way that the algorithm works is it starts with a (usually) really bad guess of what points belong to which group. Then it refines the guess over and over again until it finally is good enough. The really bad guess at the start can be done in many different ways. In my thesis, I compare three different ways to do the bad guess, and try to figure out which one is the best/cheapest.

If you are a bit more interested, here is the abstract:

"k-means is a simple and flexible clustering algorithm that has remained in common use for 50+ years. In this thesis, we discuss the algorithm in general, its advantages, weaknesses and how its ability to locate clusters can be enhanced with a suitable initialization method. We formulate appropriate requirements for the (batched) UnifRandom, k-means++ and Kaufman initialization methods and compare their performance on real and generated data through simulations. We find that all three methods (followed by the k-means procedure) are able to accurately locate at least up to nine well-separated clusters, but the appropriately batched UnifRandom and the Kaufman methods are both significantly more computationally expensive than the k-means++ method already for K = 5 clusters in a dataset of N = 1000 points"

Structure of files

The file structure will be described later.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Documentation		Documentation
Images		Images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Table.py		Table.py
Testing.py		Testing.py
Testing2.py		Testing2.py
delegation_testing.py		delegation_testing.py
figure_creation.py		figure_creation.py
kmeans.py		kmeans.py
main.py		main.py
old_functions.py		old_functions.py
random_data.py		random_data.py
random_data_testing_specific.py		random_data_testing_specific.py
seeds.py		seeds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bachelor Thesis Simulations

Summary of thesis

Structure of files

About

Uh oh!

Releases

Packages

Languages

License

TheLaughingDuck/KANDIDAT_Simulation

Folders and files

Latest commit

History

Repository files navigation

Bachelor Thesis Simulations

Summary of thesis

Structure of files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages