Anonymize CSV file(s) by replacing sensitive values with fakes.
pip install vendetta
Suppose you have orders.csv
dataset with real customer names and order IDs.
CustomerName,CustomerLastName,OrderID
Darth,Wader,1254
Darth,Wader,1255
,Yoda,1256
Luke,Skywalker,1257
Leia,Skywalker,1258
,Yoda,1259
This list contains 4 unique customers. Let's create a configuration file, say, orders.yaml
:
columns:
CustomerName: first_name
CustomerLastName: last_name
and run:
vendetta anonymize orders.yaml < orders.csv > anon.csv
which gives something like this in anon.csv
:
CustomerName,CustomerLastName,OrderID
Elizabeth,Oliver,1254
Elizabeth,Oliver,1255
Karen,Rodriguez,1256
Jonathan,Joseph,1257
Katelyn,Joseph,1258
Karen,Rodriguez,1259
- OrderID column was not mentioned in the config, and was left as is
- Using faker, program replaced the first and last names with random first and last names, making the data believable
- If in the source file two cells for the same column had the same value (Vader), the output file will also have identical values in these cells.
Enjoy!
This project was generated with wemake-python-package
. Current template version is: b80221aaae4ac702bea7e66b77b9389d527c1e3c. See what is updated since then.