Skip to content

Fraud detection in electricity and gas consumption powered by machine learning

Notifications You must be signed in to change notification settings

eburakova/fraud-detection-energy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detection of fraude Electricity and Gas consumption for Société tunisienne de l'électricité et du gaz (STEG)

logo Fraud detection in electricity and gas consumption powered by machine learning

Results summary

Model summary

model summary

The best model: performance

The best model is an XGBoost which consideres all engineered features

ROC curve Confusion matrix
alt text alt text
Recall: 0.34 Precision: 0.59 ROC AUC: 0.9 F-beta: 0.58

                Classification Report:
--------------------------------------------------
              precision    recall  f1-score   support

         0.0       0.96      0.99      0.97     95945
         1.0       0.59      0.34      0.43      5674

    accuracy                           0.95    101619
   macro avg       0.77      0.66      0.70    101619
weighted avg       0.94      0.95      0.94    101619

Project Workflow

Selected metric

  • F1 is the metric of choice as a balance between precision and recall with the weight on recall. The model should detect as many frauds as possible by keeping the number of false alerts low.

Hypotheses generated based on data exploration

  1. Households in regions where fraud occurs more frequently than average are more likely to commit fraud themselves (check from the slide of Katja and correct)

  2. Households with large variations in energy or gas consumption are more likely to commit fraud

Data source

Data description

Click to unfold

Client dataset

Feature name Description
Client_id Unique identifier for client
District District number associated to the location of a client
Client_catg Class the client belongs to; different class groups: 11, 12, 51
Region Region number associated to the location of a client
Creation_date Date when a client became customer of STEG
Target Category indicating fraudolent client when is equal to 1 and no fraudolent client when is equal to 0

Invoice dataset

Feature name Description
Client_id Unique identifier for client
Invoice_date Date of issue of the invoice relating to a given quarter due date for a client
Tarif_type Type of way a client is charged for its electricity and gas copnsumption by the company STEG; each type is associated to a unique number
Counter_number Serial number identifing the counter materialized by a series of numbers written on the counting device and unique for each client
Counter_statue Working status of the device measuring the amount of energy consumed by a client, ranging from 1 to 5 (check together the values in that column?)
Counter_code Registration number identifing the device made up of three digits located within the serial number of the meter
Counter_coefficient Coefficient used to convert the raw meter readings from the meter into actual consumption values
Consommation_level_1 Consumption level 1: it is less than 2.400 kwh per year corresponding to the cost of 181 millimes per kwh (source: https://kapitalis.com/tunisie/2022/05/12/tunisie-les-nouveaux-tarifs-de-la-steg/)
Consommation_level_2 Consumption level 2: it is between 2.401 and 3.600 kwh per year corresponding to the cost of 223 millimes per kwh (source: https://kapitalis.com/tunisie/2022/05/12/tunisie-les-nouveaux-tarifs-de-la-steg/)
Consommation_level_3 Consumpution level 3: it is between 3.601 et 6.000 kwh per year corrersponding to the of 338 millimes per kwh (source: https://kapitalis.com/tunisie/2022/05/12/tunisie-les-nouveaux-tarifs-de-la-steg/)
Consommation_level_4 Consumption level 4: it exceeds 6.000 kWh per year corresponding to the cost of 419 millimes per kwh. https://kapitalis.com/tunisie/2022/05/12/tunisie-les-nouveaux-tarifs-de-la-steg/
Old_index Old counter meter reading
New_index New counter meter reading
Months_number Number of the month (where the meter reading was taken?);
Counter_type Type of device measuring the amount of energy consumed; ELEC = measuring electricity consumption; GAZ = measuring gas consumption


Set up

macOS type the following commands :

  • For installing the virtual environment you can either use the Makefile and run make setup or install it manually with the following commands:

    make setup

    After that active your environment by following commands:

    source .venv/bin/activate

Or ....

  • Install the virtual environment and the required packages by following commands:

    pyenv local 3.11.3
    python -m venv .venv
    source .venv/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt

WindowsOS type the following commands :

  • Install the virtual environment and the required packages by following commands.

    For PowerShell CLI :

    pyenv local 3.11.3
    python -m venv .venv
    .venv\Scripts\Activate.ps1
    pip install --upgrade pip
    pip install -r requirements.txt

    For Git-bash CLI :

    pyenv local 3.11.3
    python -m venv .venv
    source .venv/Scripts/activate
    pip install --upgrade pip
    pip install -r requirements.txt

    Note: If you encounter an error when trying to run pip install --upgrade pip, try using the following command:

    python.exe -m pip install --upgrade pip

Usage

In order to train the model and store test data in the data folder and the model in models run:

Note: Make sure your environment is activated.

python example_files/train.py  

In order to test that predict works on a test set you created run:

python example_files/predict.py models/linear_regression_model.sav data/X_test.csv data/y_test.csv

About

Fraud detection in electricity and gas consumption powered by machine learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published