-
Notifications
You must be signed in to change notification settings - Fork 8
Developer Guidance
The documentation above outlines the approach for a single data owner to run these tools. For a developer who is testing on a synthetic data set, they might want to run all of the above steps quickly and repeatedly for a list of artificial data owners.
In the linkage agent tools there is a Jupyter notebook under development that will run all of these steps through the notebook by invoking scripts in the testing-and-tuning/
folder.
If you would like to test household linkage you can currently run the garble.sh
script (configuring the sites for which you have extracted pii). If you would like to test blocking you may run the blocking_garble.sh
script. Note: for these scripts it is assumed that the pii files created by the extract.py
have been renamed to their respective pii_{site}.csv
.
The testing-and-tuning/generate_secret.py
script will create a secret salt for you if require it, e.g.:
python testing-and-tuning/generate_secret.py
This should create a new file called deidentification_secret.txt in your root directory.
In between runs it is advisable to run rm temp-data/*
to clean up temporary data files used for individuals runs.
This repository uses black
, flake8
, and isort
to maintain consistent formatting and style. These tools can be run with the following command:
black .
isort .
flake8