- Michael Arabian
- Thomas Le
- Andre Saad
For this assignment, we used Python along with the Scikit-learn machine learning framework to experiment with two different machine learning algorithms. We used a provided sentiment data set. The focus of this assignment was to gain experience on experimentations and analysis. See http://scikit-learn.org/stable/ for official documentation.
- Download our GitHub repository as a Zip or use 'Git Clone' to have a copy on your computer.
git clone https://github.com/aramich100/COMP472_A1
- Make sure to have Python 3.9.2 installed on your computer. If you do not have it, you can install it here : https://www.python.org/downloads/
- Run the following command in your terminal in order to install scikit-learn and all necessairy libraries within the main.pu:
pip install scikit-learn
- To run the script, type the following command into the terminal. Make sure you are in the proper directory.
py main.py
- The program will now run the tasks in sequential order. Task 2 will display a plot and will pause the program. Once the plot from task 2 is closed, the remaining tasks will continue.
As instructed, using the SciKit Framework, we were able to run 3 different Machine Learning Algorithms and obtained very promising results.
- Accuracy: 80.65463701216954
- Confusion Matrix: [ [ 1006 224 ][ 237 916 ]]
- Accuracy: 72.2198908938313
- Confusion Matrix: [ [ 870 360 ] [ 302 851 ]]
- Accuracy: 73.46454049517415
- Confusion Matrix: [ [ 868 362 ] [ 318 835 ]]
We can see that the Naive Bayes algorithm held the highest accuracy while compared to the Decision Tree. This is due to the fact that the Decision Tree is a discriminative model, whereas the Naive Bayes is a generative model. Given our data set, the Naive Bayes is best suited for the highest accuracy.