Welcome! This repo contains .ipynb Python notebooks that explore the Iris flower dataset using numerous machine learning methods and tools as listed, mostly written from scratch. The aim is to walk through and teach you about ML methods and demonstrate the analysis of data from scratch.
- Minkowski Distance
- Distance Matrix
- Reservoir Sampling
- Stratified Sampling
- Linear Regression with Gradient Descent
- K-Nearest Neighbors
- K-Means Clustering
- Visualizing all of the above :)
A PDF version of each notebook is included so that you do not have to run individual code pieces.
To run these series, you need to use Jupyter Notebook, Google Colab, or any other notebook interpreter. I used Jupyter Lab installed on Anaconda.
You need the following libraries:
- pandas
- numpy
- matplotlib
- seaborn
- random (standard library)
The Iris dataset from UCI's Machine Learning repo included in src/
All prompts in the notebooks have been kindly written and provided by professor Vagelis Papalexakis from the University of California, Riverside.
All code and objective answers are written by myself, Amirsadra Mohseni.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
University students beware: if you have arrived at this repo and found my code, so can your professor. Professors have seen hundreds of projects and their source code. It doesn't take them much time to recognize what you are presenting as your own was actually made by someone else. If you wish to cite my source code, please ask your supervisor about the policies of doing so.