Malware Detection Using AI and Machine Learning

Problem Statement

Malware poses a significant and constant threat to cybersecurity, with programs designed to damage, disrupt, or gain unauthorized access to computer systems. The rapid evolution of malware creation techniques has rendered traditional detection approaches insufficient. Artificial Intelligence (AI) provides a promising solution by automating and improving malware detection through the use of machine learning and deep learning models.

This project explores the application of AI models to classify and detect malware, offering a modern approach to bolster cybersecurity defenses.

Objectives

Understand the Dataset: Analyze a technical dataset containing features of executable files to identify patterns and relevant insights.
Build and Evaluate AI Models: Design and evaluate multiple classification models, including:
- Decision Trees
- Random Forests
- An unsupervised clustering model (KMeans)
Performance Comparison: Compare the performance of models using evaluation metrics such as:
- Accuracy
- Confusion Matrix
- Precision, Recall, and F1-Score
Deep Learning Integration: Train and test a simple deep learning model to assess its effectiveness in malware detection.

Solution Approach

Dataset Analysis:
- Preprocessed the dataset by scaling the features to ensure uniformity and improve model performance.
- Conducted feature selection to enhance clustering accuracy for unsupervised models.
Model Development:
- Implemented supervised learning models (Decision Tree and Random Forest) to classify malware vs. legitimate files.
- Built an unsupervised clustering model (KMeans with 2 clusters) to group data points without prior labels.
- Trained a deep learning model using TensorFlow/Keras with a fully connected neural network architecture.
Model Evaluation:
- Evaluated models on a test set using metrics like accuracy, confusion matrix, precision, recall, and F1-score.
- Performed 5-fold cross-validation to ensure robustness and generalization.
Performance Comparison:
- Compared supervised models to determine the most effective approach for malware detection.
- Assessed the clustering effectiveness of KMeans using Adjusted Rand Index (ARI).

Results

Supervised Models

Decision Tree:
- Accuracy: 98.79%
- Cross-Validation Average Accuracy: 98.65%
Random Forest:
- Accuracy: 99.19%
- Cross-Validation Average Accuracy: 99.04%

Unsupervised Model

KMeans Clustering:
- Adjusted Rand Index (ARI): 0.4811

Deep Learning Model

Achieved high accuracy with potential for further optimization.

Conclusion

Best Performing Model: Random Forest achieved the best overall performance in terms of accuracy and generalization.
Consistency: Both Decision Tree and Random Forest showed consistent results between simple data splits and cross-validation, indicating robustness.
KMeans Effectiveness: KMeans clustering improved with feature selection but was less effective in differentiating malware and legitimate files compared to supervised models.

How to Use

Clone this repository:

git clone https://github.com/yourusername/malware-detection-ai.git
cd malware-detection-ai

Ensure the dataset is available: The dataset (MalwareDataset.csv) is included in the repository. Ensure it is in the same directory as the script (or in a data folder if specified in the code).
Install required Python packages:
```
pip install -r requirements.txt
```
Run the models:
```
python main.py
```

Future Improvements

Optimize the deep learning model architecture for better performance.
Experiment with other clustering algorithms to improve unsupervised model results.
Incorporate additional features to enhance model accuracy and robustness.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
MalwareDataset.csv		MalwareDataset.csv
Malware_Detection_Using_ai.ipynb		Malware_Detection_Using_ai.ipynb
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Detection Using AI and Machine Learning

Problem Statement

Objectives

Solution Approach

Results

Supervised Models

Unsupervised Model

Deep Learning Model

Conclusion

How to Use

Future Improvements

About

Releases

Packages

Languages

LMeriem/Malware-Detection-Using-AI

Folders and files

Latest commit

History

Repository files navigation

Malware Detection Using AI and Machine Learning

Problem Statement

Objectives

Solution Approach

Results

Supervised Models

Unsupervised Model

Deep Learning Model

Conclusion

How to Use

Future Improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages