Read this before doing anything.
For any questions, contact via socials on my GitHub profile (first state where are you from and why are you interested).
- About the project
- Requirements
- Research files
- Estimation Framework
- (Bonus) Measurement Tool Combination Framework
This is a MSc Thesis Project that explored the estimation of the energy consumption of the Privacy-Enhancing Technologies (PET), as well as privacy risks and utility of the dataset after PET treatment. The estimation is based on the dataset properties, which is used by Gradient Boosting Model.
The datasets and their sources are listed in the csv file all_datasets_and_their_sources.csv
Make sure your Python version is the following:
Python 3.10.12 - 3.10.14 (pip 22.0 - 24.0)
Run the following line from the terminal:
pip install -r requirements.txt
All research conducted can be located in the folder research_files
.
It includes:
- Cleaned datasets folder
- Figures folder
- 8 Notebooks for feature selections
- 2 dataset properties and measurement datasets
- Measurement files (main file is called
data_collection_main.py
)
Before launching the framework:
- It's necessary to clean your file.
- Plus, it's necessary to preprocess the target attribute for classification purposes.
- For other attributes, please do not modify them in a way, that would change their values.
- After synthetic data has been generated, the framework will apply MinMaxScaler and One-Hot encoding for ML tasks.
Running on Terminal:
Linux (or WSL):
python3 main_project.py
Windows:
python main_project.py
MacOS:
python3 main_project.py
Running on Jupyter Notebook:
jupyter notebook
Then, run the notebook, like you normally would a function:
launch_estimation(filename = None, continuous_to_categorical = None, target = None, epsilon = None)
- Name of the csv file, that is in the
put_your_dataset_here
folder - Attributes, that are categorical, but because of their numerical format, could be mistaken for continuous
- Target attribute
For terminal:
Follow the instructions in the terminal!
For Jupyter Notebook:
- The filename needs to be just a filename. No additional path is necessary.
- The
continuous_to_categorical
must be a list of strings. - Input for epsilon can be either
0
for No Differential Privacy or1
for Differential Privacy with epsilon value of 0.1.
Before launching the framework:
- It's necessary to clean your file.
- Plus, it's necessary to preprocess the target attribute for classification purposes.
- For other attributes, please do not modify them in a way, that would change their values.
- After synthetic data has been generated, the framework will apply MinMaxScaler and One-Hot encoding for ML tasks.
Running on Terminal:
Linux (or WSL):
sudo python3 main_single_measurement.py
*
Windows:
runas /user:Administrator "python main_single_measurement.py"
*
MacOS:
sudo python3 main_single_measurement.py
*
Running on Jupyter Notebook:
sudo jupyter notebook --allow-root
*
Then, run the notebook, like you normally would a function.
launch_measurement(input_filename=None, target_attribute_ML=None, num_to_categ = None, possible_known_attributes = None, secret_mode = None, save_my_report_to_csv = None)
# Note: possible_known_attributes and secret_mode are disabled.
*It is necessary to run with the administrator rights in order to perform all hardware measurements of energy consumption. Otherwise, it won't work.
- Name of the csv file, that is in the
put_your_dataset_here
folder - Attributes, that are categorical, but because of their numerical format, could be mistaken for continuous
- Target attribute
Note: The Linkability and Inference risk measurements are disabled.