Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ROMM within addNoise by RegSDC methodology #271

Open
olangsrud opened this issue Jan 8, 2019 · 0 comments
Open

Improve ROMM within addNoise by RegSDC methodology #271

olangsrud opened this issue Jan 8, 2019 · 0 comments

Comments

@olangsrud
Copy link

It seems that ROMM within addNoise is implemented in a way not preserving sample means. Below I suggest how to fix this and speed up the calculations remarkably by utilizing methodology in a recent paper. See https://github.com/olangsrud/RegSDC (hopefully soon on CRAN). Below, I will refer to the functions in that package.

y <- testdata[sample(NROW(testdata), 100), c("expend", "income", "savings")]
addNoise(y, method = "ROMM")$xm

# An almost identical (read about sequentially phenomenon in paper for minor differences) method is  

RegSDCromm(y, lambda = 0.001, ensureIntercept = FALSE)

# This can be viewed as a high-speed version of the current implementation in addNoise.
# Sample means is preserved by the default method where ensureIntercept = TRUE.
# Other values of lambda may be used. 

RegSDCromm(y, lambda = 0.001)

# This is equivalent to calling a more general function 

RegSDCgen(y, lambda = 0.001, makeunique = TRUE)

# The parameter makeunique is of minor importance, but must be TRUE if exact distributional behaviour 
# is important (sample form RegSDCromm several times). So setting makeunique to FALSE can be OK. 

# Feel free to import/wrap functions from  RegSDC within sdcMicro.  
# However, this line 

RegSDCgen(y, lambda = 0.001, makeunique = FALSE)

# can be implemented without using RegSDC by 

lambda <- 0.001
y <- as.matrix(y)
Mean <- function(x) t(matrix(colMeans(x), ncol(x), nrow(x)))
qr1 <- qr(y - Mean(y))
qr1Q <- qr.Q(qr1)
z <- qr1Q + lambda * matrix(rnorm(length(qr1Q)), nrow(y))
qr2 <- qr(z - Mean(z))
Mean(y) + qr.Q(qr2) %*% qr.R(qr1)

# Here Mean can be replaced in several ways. The difference from the result using RegSDCgen is at the 
# level of numerical precision (use set.seed to see).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant