Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tamnvhust1 committed Mar 12, 2020
1 parent e6b6cab commit f9d37f6
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 34 deletions.
37 changes: 15 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,28 @@
# MalNet - Detect malware using Convolutional Neural Networks
By Tam Nguyen Van
# Introduction
The repository contains all source code for training and evaluate malware detection with **MalNet**.
Malware detection using Convolutional Neural Networks.
# Requirements
1. Python 3.6
2. Keras (2.0.8)
3. Tensorflow (1.2.0)
1. Python >=3
2. Keras (>=2.0.8)
3. Tensorflow (>=1.15)
# Installation
1. Clone the repository to your local.
`git clone https://github.com/tamnguyenvan/malnet`
2. Install all requirements (use **virtualenv** is recommneded). Note: It just works on Python 3.6 (maybe 3.5 but I haven't tested).
- `pip install tensorflow==1.2.0` (CPU only) or `pip install tensorflow-gpu==1.2.0` (GPU)
2. Install all requirements (**virtualenv** is recommneded).
- `pip install tensorflow==1.15` (CPU only) or `pip install tensorflow-gpu==1.15` (GPU)
- `pip install -r requirements.txt`
3. Make data directory. For example, make a direcotory called **data** in the root project directory. The data can be found at [here](https://drive.google.com/drive/folders/1zUXAb7JnwOiBtfBheQI6LDFu4EG_XZ-_). After downloading, extract it and put all data files into the data directory.
# Training
If you have accomplished the installation step correctly, almost done. We just need run `python train.py` for training with default parameters.
Some options:
- `--model` Use specific model. For now, just `malnet`, `et` and `rt` are available.
- `--batch-size` Set batch size to fit our memory. Default is 32.
- `--epochs` Number of epochs will be trained. Default is 5.
3. Download Ember dataset [here](https://pubdata.endgame.com/ember/ember_dataset_2018_2.tar.bz2). You can go to their [home page](https://github.com/endgameinc/ember) for more details. Extract to wherever you like.
4. Extract features by running: `python create_data.py --data_dir PATH_TO_DATA_DIR`. See `create_data.py` for the details. After that, some `.dat` file should be created in the same directory.
# Training model
Almost done, just run `python train.py --data_dir PATH_TO_DATA_DIR` for training. Show help to see additional options.

Please see source code for more details.
# Evaluate
The training script also had evaluation step. But, we still provide other script for evaluating independently. After training, the model will be saved in **result/checkpoint**. We can evaluate this or use my pretrained model that can be found at [here](https://drive.google.com/file/d/1zD99s0L9l1eVPmSo9o6c3WgkZrpa2e2o). The directory must contain 3 files:
- `model.h5` Model weights.
- `model.json` Model graph.
- `scaler.pkl` Pickle binary file contains an object for preprocessing scaler.
# Evaluate model
In case you want to regenerate validation result, run `python eval.py --data_dir PATH_TO_DATA_DIR--model_path MODEL_PATH --scaler_path SCALER_PATH`. Again, show help to see options.

# Deploy
Let's have some fun. We will try the pretrained model on real PE files. Download your PE file then run `python test.py --input_file INPUT_FILE --model_path MODEL_PATH`.

In order to evaluating, just run `python src/eval.py`.
# Deployment
In this section, we will try to use the model to predict samples from the real. We provided a script for this in **src/test.py**. So all we need to do is just run `python src/test.py --input [path/to/sample/file]`.
# Contact
Tam Nguyen Van ([email protected])
Any questions can be left as issues in this repository. You're are welcome all.
12 changes: 6 additions & 6 deletions malnet/create_data.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""
- Author: tamnv
- Description: This script will extract raw data from EMBER
json files, then write into 4 files: X_train.dat, X_test.dat,
y_train.dat and y_test.dat
"""This script help to extract raw data from EMBER json files.
It will store features into 4 files: X_train.dat, X_test.dat, y_train.dat
and y_test.dat. You can limit number of sample using option `scale`.
Usage: python create_data.py --data_dir DATA_DIR --scale SCALE
"""

import argparse
Expand All @@ -14,7 +14,7 @@
def parse_arguments(argv):
"""Parse command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument('--data-dir', dest='data_dir', type=str, default='data',
parser.add_argument('--data_dir', dest='data_dir', type=str, default='data',
help='Path to data directory.')
parser.add_argument('--scale', dest='scale', type=float, default=1.,
help='Scale of training/test dataset.')
Expand Down
38 changes: 32 additions & 6 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,34 @@
keras==2.0.8
absl-py==0.9.0
astor==0.8.1
cycler==0.10.0
gast==0.2.2
google-pasta==0.1.8
grpcio==1.27.2
h5py==2.10.0
joblib==0.14.1
Keras==2.0.8
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
lief==0.10.1
Markdown==3.2.1
matplotlib==2.2.2
scipy
scikit-learn
pandas
numpy==1.18.1
opt-einsum==3.2.0
pandas==1.0.1
pkg-resources==0.0.0
protobuf==3.11.3
pyparsing==2.4.6
python-dateutil==2.8.1
pytz==2019.3
PyYAML==5.3
scikit-learn==0.22.2.post1
scipy==1.4.1
six==1.14.0
tensorboard==1.15.0
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0
termcolor==1.1.0
tqdm==4.23.4
h5py
lief
Werkzeug==1.0.0
wrapt==1.12.1

0 comments on commit f9d37f6

Please sign in to comment.