Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding oversuppression for small problems in kAnon #354

Open
matthias-da opened this issue Oct 3, 2024 · 0 comments
Open

Avoiding oversuppression for small problems in kAnon #354

matthias-da opened this issue Oct 3, 2024 · 0 comments
Assignees

Comments

@matthias-da
Copy link
Collaborator

matthias-da commented Oct 3, 2024

This is an example from MonteroSerrano, Javier, to practically see the oversuppresion problem.

Overprotection in 6x2 example

# Example with 6x2 data frame where kAnon (k = 3) makes 3 suppressions, 
# while 1 suppression would have been enough.
# (Note: 3 suppressions would be needed with alpha = 0, but not with alpha = 1).

# Create data
data_3 <- data.frame(
    gender = c("male", "male", "male", "male", "male", "male"),
    education = c("no education", "primary", "primary", "primary", "secondary", "secondary"))

# Create sdc object
sdc_data_3 <- createSdcObj(data_3, keyVars = c("gender", "education"), alpha = 1)

# kAnon with k = 3 makes 3 suppressions, but 1 suppression would have been enough.
sdc_data_kAnon <- kAnon(sdc_data_3, k = 3)
extractManipData(sdc_data_kAnon)
print(sdc_data_kAnon, "kAnon")

# Manually forcing 1 suppression generates data that already comply with 3-anonymity: 
data_3_edited         <- data_3
data_3_edited[1,2]    <- NA_character_
sdc_data_kAnon_manual <- createSdcObj(data_3_edited, keyVars = c("gender", "education"), alpha = 1)
print(data_3_edited)
print(sdc_data_kAnon_manual, "kAnon")

The reason is that kAnon is a heuristic algorithm that lead to oversuppression.

Idea of extensions: Implement a linear mixed-interger linear programming solution for small problems for an optimal suppression pattern. Guidance is given in Ton de Waal's book, Handbook of Statistical Data Editing and Imputation (Wiley).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants