Development of practical works (TP) related to Machine Learning field.
💬 Description | 📁 Data | 👨🏻💻 Code |
---|---|---|
TP-1: Anscombe's quartet Analysis of the importance of the outliers effect and data visualization. |
Anscombe's datasets (source). | Jupyter Notebook |
TP-2.1: Data visualization General exploratory analysis to find data showing abnormal behavior. |
Sanitary and epidemiological situation of the municipality of Bahía Blanca, Argentina (source). | Jupyter Notebook |
TP-2.2: Parametric classifier Minimum error classifier design and performance analysis against variations of the mean and standard deviation of the generated data. |
"Randomly" generated Gaussian distributed data. | Jupyter Notebook |
TP-3.1: KNN Overview Creation of K-nearest neighbors (KNN) classifiers and performance evaluation against some training parameters. |
Random samples from a normal (Gaussian) distribution. | Jupyter Notebook |
TP-3.2: KNN GridSearch Evaluation of hyperparameters and their combination for a k-nearest neighbors (knn) classifier. K-Fold cross-validation is implemented to find the influence of the data on the model. |
Random samples from a normal (Gaussian) distribution. | Jupyter Notebook |
TP-3.3: Spotify songs Development and tunning of a k-nearest neighbors (knn) classifier to predict whether a given song will be liked or not. Feature engineering is implemented to select the data that contributes the most information to the model. |
More than 2000 Spotify songs from a specific user marked as liked or disliked (source). | Jupyter Notebook |
Fog event forecasting Comparison of ensembles to predict the occurrence of fog event in the next hour. Bagging and boosting algorithms are implemented to achieve this purpose, including some basic hyperparameter tuning. |
Meteorological data from the Ezeiza (Buenos Aires, Argentina) weather station with hourly measurements from 1979 to 2011 (source). | Jupyter Notebook |
TP-5: Customers segmentation Construction of clustering algorithms to segment customers based on their annual consumption pattern in product categories. Silhouette coefficient is implemented to evaluate each model performances. |
Clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories (source). | Jupyter Notebook |
TP-6: Boston housing prices Construction of regression algorithms to predict property sales prices in the city of Boston. Feature selection techniques are implemented to reduce data dimensionality. |
Boston Housing dataset with 506 observations and 14 features describing housing prices (source). | Jupyter Notebook |