sdcMicro is an R-package to anonymize microdata. Most functionalities of the package are also available via an interactive shiny-based graphical user interface.
The online documentation can also be found at sdctools.github.io/sdcMicro
.
sdcMicro
Home Page: http://sdctools.github.io/sdcMicro/
sdcMicro is an R-package to anonymize microdata. Most functionalities of the package are also available via an interactive shiny-based graphical user interface.
The online documentation can also be found at sdctools.github.io/sdcMicro
.
It seems that groupVars deals with integers in parameter before and after.
Another strategy of dealing with the levels should be defined since this runs without warning, but produces wrong results:
require(sdcMicro); data(testdata, package="sdcMicro")
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
pramVars=c("walls"),
w='sampling_weight',
hhId='ori_hid')
labs <- c("1-9","10-19","20-29","30-39",
"40-49","50-59","60-69","70-79","80-130")
sdc <- globalRecode(sdc, column="age",
breaks=c(0,9,19,29,39,49,59,69,79,130),
labels=labs)
table(extractManipData(sdc)$age)
1-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-130
1128 1110 635 629 447 250 200 70 13
sdc <- groupVars(sdc, var="age",
before=labs,
after=c("1-9","10-19","20-29","30-39",
"40-49","50-59","60-69","70-130"))
table(extractManipData(sdc)$age)
1-9 10-19 20-29 30-39 40-49 50-59 60-69 70-130
1141 1110 635 629 447 250 200 70
Based on Joerg Drechslers e-mail to us, reporting a problem with mdav.
microaggregation gives wrong results for method mdav (old slow version and IHSN version) for the following toy data:
library(sdcMicro)
toyData <- data.frame(x=c(2,3,3,20,21), y=c(1,2,2,19,20))
toyData
x y
1 2 1
2 3 2
3 3 2
4 20 19
5 21 20
The result should be:
1 2.666667 1.666667
2 2.666667 1.666667
3 2.666667 1.666667
4 20.500000 19.500000
5 20.500000 19.500000
However, the outcome is:
m3 <- microaggregation(toyData, method="mdav", aggr=2)
m3$mx ## wrong!
m3$mx
x y
1 2.5 1.5
2 3.0 2.0
3 2.5 1.5
4 20.5 19.5
5 20.5 19.5
wrong.
By the way, the old version gives the correct values but the ordering of observations is not correct:
m3 <- microaggregation(toyData, method="mdavold", aggr=2)
m3$mx ## wrong!
m3$mx
[,1] [,2]
3 2.666667 1.666667
4 20.500000 19.500000
5 20.500000 19.500000
31 2.666667 1.666667
41 2.666667 1.666667
GUI-Guidelines (from menu, Help): include latest version of guidelines
dat <- data.frame(x=rnorm(100))
microaggregation(obj=dat, variables="x", method="mdav", aggr=3, measure="mean")
-> error
dat <- data.frame(x=rnorm(100), y=rnorm(100))
microaggregation(obj=dat, variables="x", method="mdav", aggr=3, measure="mean")
-> works
microaggregation(obj=dat, variables=c("x","y"), method="mdav", aggr=3, measure="mean")
-> works but inconsistent output (invisible())
The plots in the Recode window, even for very small examples are not there.
Frequencies (age, after recoding the age, e.g. the frequencies of the old
age classes are also displayed as xx.1, xx.2, .... --> correct this)
require(sdcMicro); data(testdata)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
w='sampling_weight', hhId='ori_hid')
length(table(sdc@manipKeyVars[,"age"])) #OK
p1 <- print(sdc)
sdc <- globalRecode(sdc, column="age",
breaks=c(1,9,19,29,39,49,59,69,100), labels=1:8)
length(table(sdc@manipKeyVars[,"age"])) #OK
p2 <- print(sdc)
identical(p1,p2)
activedataset <- testdata
sdcObject <- createSdcObj(activedataset,keyVars=c('urbrur','roof','sex','age'),
numVars=c('expend','income','savings'),weightVar=c('sampling_weight'),hhId=c('ori_hid'))
sdcObject <- microaggregation(sdcObject,aggr=c( 3 ), method=c('mdav'), variables=c('expend'))
8: stop("undefined columns selected")
7: `[.data.frame`(xm, , i)
6: xm[, i]
5: dUtilityWORK(x = x, xm = xm, method = "IL1", ...)
4: dUtility(obj)
3: dUtility(obj) at microaggregation.R#29
2: microaggregation(sdcObject, aggr = c(3), method = c("mdav"),
variables = c("expend")) at microaggregation.R#3
1: microaggregation(sdcObject, aggr = c(3), method = c("mdav"),
variables = c("expend"))
Risk", "Protection", "Information Loss" (in Tab Categorical, Continuous)
larger or bold to have it more visible.
since there is now no function pram_strata(), we have to options to deal with this situation in the docs.
the same has to be done for references to the "old" pram function, since parameters have been changed.
thoughts?
require(sdcMicro); data(testdata)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
#pramVars=c("walls"),
w='sampling_weight',
hhId='ori_hid')
sdc <- pram(sdc)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
pramVars=c("walls"),
w='sampling_weight',
hhId='ori_hid')
sdc <- pram(sdc)
Help menu:
it should be possible to undo a recode to factor if user were specifying breaks
Seems output=input ?!
explain: expected number of re-identifications !!! (in the GUI Tutorial)
UNDO: document somewhere that x undo's are possible for data size of x: >100000 kein UNDO, sonst ist maxUndo auf 1 gesetzt ist aber beliebig änderbar (maybe only in the GUI tutorial)
problem: es koennen keine "nicht-keyVars" als variablen fuer pram verwendet werden. gleichzeitig ist es nicht möglich, localSuppression() nur auf ein subset von "keyVars" auszuführen. Ich denke, folgendes sollte möglich sein:
zum programmieren in sdcMicro scheint es recht einfach zu sein, aber ich bin mir nicht sicher, ob das nicht auswirkungen auf das gui und andere sachen (extractManipData(), get.sdcMicroObj(..., type="manipData")) hat. alex?
would be nice to have. print function should show basic things about the object
(data size, key variables)
Change it?!?
in Window "Recode" now "rename selected level" and "group selected levels" instead of "rename" and "group", war das gemeint?
in general: abbrechen --> cancel. Can that be changed even one uses German
Language. Mix of Languages is not good...
OK and cancel in the same order. in some windows Cancel is left to OK, in
some it is right to OK. -- Suchen, wo!?!?
change the sentence "Reported is the number | ..." --> should be better
readable and understandable.
I have installed all required packages for sdcMicroGUI and it seems that it works OK but when i try to select variables to be categorical, numerical, weight etc and when i click OK windows appear and it shows script running but it never finishes. and in the R console appears
Error in measure_riskWORK(manipData, keyVars, w = w, hid = hhId, ...) :
Please define valid key variables
What is the problem?
Thanks
we use "data(ses)" in a vignette --> add "laeken" to deps?
Critical Revision: Make an UNDO also possible in the Recode window
(example: one recodes age in 6 classes, but then he want to reject this and
want to have 7 classes)
wenn die Variable anschließend als stratavar im sdcObject ausgewählt ist.
==> Create_strata_var muss ins Skript
sometimes the individual risks evaluate to NaN. In this case localSupp() threw an error. We can easily work around this issue but we have to look at the problems with the individual risk calculation.
only when using a sdcMicroObj, otherwise it works.
data(testdata)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- mafast(sdc,by="urbrur")
4000 might be too low now with the quiet fast system, maybe set it to 20000?
The new warning text is: Large data sets require extensive computation time, so please be patient.
short explanation what means fk and Fk( with overlay?) -- tooltip added fk=sample frequency\nFk=grossed up population frequency
library(sdcMicro)
[snip]
library(far)
Loading required package: nlme
far library : Modelization for Functional AutoRegressive processes
version 0.6-0 (2005-01-10)
data(Tarragona)
a1 <- addNoise(Tarragona)
a2 <- addNoise(Tarragona,method="ROMM")
[1] "please load package far"
[[it freezes here and says i have not loaded far
even though i have]]
[[i just kill the process otherwise it overloads my computer]]
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] far_0.6-3 nlme_3.1-117 sdcMicro_4.3.0 xtable_1.7-3 data.table_1.9.2
[6] knitr_1.6 brew_1.0-6
loaded via a namespace (and not attached):
[1] car_2.0-20 class_7.3-10 cluster_1.15.2 DEoptimR_1.0-1 e1071_1.6-3
[6] evaluate_0.5.3 formatR_0.10 grid_3.1.0 lattice_0.20-29 MASS_7.3-31
[11] nnet_7.3-8 plyr_1.8.1 Rcpp_0.11.1 reshape2_1.2.2 robustbase_0.91-1
[16] sets_1.0-13 stringr_0.6.2 tools_3.1.0
I just installed XQuartz and GTK+ on OS X 10.9.1 but get the following error message:
( I restarted R of course.)
The GTK+ installation screen keeps prompting.
If the package still does not load, please ensure that GTK+ is installed and that it is on your PATH environment variable
IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
Loading required package: gWidgets
Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
call: .Call(name, ..., PACKAGE = PACKAGE)
error: "S_gtk_icon_factory_new" not available for .Call() for package "RGtk2"
In addition: Warning message:
Failed to load RGtk2 dynamic library, attempting to install it.
Error: package ‘gWidgetsRGtk2’ could not be loaded
ohne das jetzt ueberpruefen zu koennen - ich glaube der print output unterscheidet sich bei den beiden Funktionen, wobei pram() detailiertere Info gibt, zumindest falls nur eine Variable gepramt wird.
set.seed( 657613 )
activedataset <- testdata
sdcObject <- createSdcObj(activedataset,keyVars=c('urbrur','roof','walls'),numVars=c('income'),weightVar=c('sampling_weight'),hhId=c('ori_hid'))
sdcObject <- microaggregation(sdcObject,aggr=c( 3 ), method=c('mdav'), variables=c('income'))
sdcObject <- microaggregation(sdcObject,aggr=c( 3 ), method=c('mdav'), variables=c('income'),strata_variables=c('urbrur'))
sdcObject@risk$numeric
s=dRisk(sdcObject)
s@risk$numeric
after selection of variables the one window which pops up - change text
there: something like "...but not recognized as correct format. This can be
confirmed or changed in the next window." --fixed
due to update of data.table, localSuppressionWORK does not work and thus a test ("tests/reporting_test.R") fails.
No idea what that means, since there is no "type" in the Recode window in the frame "Recode to factor"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.