I am working on deploying machine learning algorithm that helps predict if breast cancer is cancerous or not.
- I have the data set
- I have prepared the notebook
- I have done pre-data processing
Link: commit
Today I worked on Data processing and feature selection in my cancer project: The aim is to find which are the best features that i can use in the machine learning algorithm for the best predictive results
Some of the things I want to do are:
- Deploy SVM, KNN and KMeans
- Use PCA to reduce dimensions
- Normalize results of the data I have before I deploy the models
<<<<<<< HEAD
Link: commit
Today I am going to look at deploying Decision tree algorithms for a dataset that I got from Kaggle. This dataset was given by a bank to assist predict which candidates are able to pay their loans.
- Deploy Decision Tree
- Make my first submission on kaggle for the results I get
I am alittle stuck though: how do i extract into csv the information for the kmeans classifier to deal with it? I feel somewhat confused.
Today I watched a tutorial from Raj on implementing linear regression technique from scratch.
I am doing some studying on different algorithms: link
I forked some github repository to assist me learn more about these algorithm implementations: link
Reviewed machine learning implementations of peers in edx.org
=======
Today I am taking time to look through various algorithms as much as possible and also to practice my programming. I had a question on how I could produce a resulting dataset but now I have the answer to it.
I predict in the near days I will be able to do much more seeing that I have kinda made it through the novice stages.
I have looked at these today:
-
- Linear Regression
-
- Logistics regression
-
- Got an sklearn cheat sheet which drew a great picture on how the info works.
-
- Decision tree algorithm
-
- Support Vector Machines
-
- Read a chapter collective intelligence.pdf
what is amazing is that the decision tree algorithm was able to classify the information 100% correct which is quite amazing in comparison to what i have seen so far.
Today is a continuation of ML algorithms
-
- Naive Bayes.
-
- KNN.
-
- Kmeans.
-
- Random Forest.
-
- Tuning algorithms Tuning Hyperparamenters
-
- Random Forest
I am learning so much, I am glad. Today landed on the notion of tuning algorithms, I am getting the clearer picture day by day.
Studied: - how to read research a paper by rajraval - how to read math equations - signed up for introduction to mathematical thinking by Stanford Uni
Hyper-parameter testing:
Today I want to look at parameter tuning and hyper parameter tuning
-
- Dimension reduction Algorithms
-
- Gradient boosting Algorithm Hyper Parameter Tuning XGBoost
-
- Light Gradient Boosting Algorithm Parameter tuning
Today I forked some work on parameter tuning for XGBoost aglorithms Got some GitHub repos, which I am going through a cell at a time.
Studied various articles on handling data: 1- Cool Article 2- 2 3- 3 4- 4 5- Hyper parameter tuning
6- CatBoost
Today concentrated on going through my course: DataScience with Python from edx I also begun on a new course: Mathematical Thinking which will help me catch up with stat when I begin it. I want to take these two courses at the same time. one on Coursera and another on edx both about Math.
TODAY: I studied this article on the process of Machine learning I documented many steps on what it takes to get good results in the process of ML.
THOUGHTS: I had not idea it was so cumbersome this process, stages of feature engineering and all, its quite hectic but a great learning process. Commit