Giter Club home page Giter Club logo

sdcmicro's Introduction

sdcMicro

R-CMD-check CRAN Downloads Mentioned in Awesome Official Statistics

sdcMicro is an R-package to anonymize microdata. Most functionalities of the package are also available via an interactive shiny-based graphical user interface.

The online documentation can also be found at sdctools.github.io/sdcMicro.

sdcmicro's People

Contributors

ajdamico avatar alexkowa avatar bernhard-da avatar bigfoot31 avatar cmachingauta avatar coatless avatar evankarageorgos avatar gregordecillia avatar kyoshido avatar leebrian avatar matthias-da avatar schloerke avatar skounis avatar thijsbenschop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdcmicro's Issues

groupVars . Can this be repaired?

It seems that groupVars deals with integers in parameter before and after.
Another strategy of dealing with the levels should be defined since this runs without warning, but produces wrong results:

require(sdcMicro); data(testdata, package="sdcMicro")
sdc <- createSdcObj(testdata,
          keyVars=c('urbrur','water','sex','age'), 
          numVars=c('expend','income','savings'),
          pramVars=c("walls"), 
          w='sampling_weight', 
          hhId='ori_hid')

labs <- c("1-9","10-19","20-29","30-39",
          "40-49","50-59","60-69","70-79","80-130")
sdc <- globalRecode(sdc, column="age",
                    breaks=c(0,9,19,29,39,49,59,69,79,130), 
                    labels=labs)

table(extractManipData(sdc)$age)

1-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-130
1128 1110 635 629 447 250 200 70 13

sdc <- groupVars(sdc, var="age",
                    before=labs, 
                    after=c("1-9","10-19","20-29","30-39",
                             "40-49","50-59","60-69","70-130"))
table(extractManipData(sdc)$age)

1-9 10-19 20-29 30-39 40-49 50-59 60-69 70-130
1141 1110 635 629 447 250 200 70

mdav

Based on Joerg Drechslers e-mail to us, reporting a problem with mdav.

microaggregation gives wrong results for method mdav (old slow version and IHSN version) for the following toy data:

library(sdcMicro)
toyData <- data.frame(x=c(2,3,3,20,21), y=c(1,2,2,19,20))
toyData
   x  y
1  2  1
2  3  2
3  3  2
4 20 19
5 21 20

The result should be:

1  2.666667  1.666667
2  2.666667  1.666667
3  2.666667  1.666667
4  20.500000 19.500000
5  20.500000 19.500000

However, the outcome is:

m3 <- microaggregation(toyData, method="mdav", aggr=2)
m3$mx ## wrong!
m3$mx
     x    y
1  2.5  1.5
2  3.0  2.0
3  2.5  1.5
4 20.5 19.5
5 20.5 19.5

wrong.

By the way, the old version gives the correct values but the ordering of observations is not correct:

m3 <- microaggregation(toyData, method="mdavold", aggr=2)
m3$mx ## wrong!
m3$mx
        [,1]      [,2]
3   2.666667  1.666667
4  20.500000 19.500000
5  20.500000 19.500000
31  2.666667  1.666667
41  2.666667  1.666667

microaggregation with only one variable in the input data set

dat <- data.frame(x=rnorm(100))
microaggregation(obj=dat, variables="x", method="mdav", aggr=3, measure="mean")

-> error

dat <- data.frame(x=rnorm(100), y=rnorm(100))
microaggregation(obj=dat, variables="x", method="mdav", aggr=3, measure="mean")
-> works

microaggregation(obj=dat, variables=c("x","y"), method="mdav", aggr=3, measure="mean")
-> works but inconsistent output (invisible())

kein Update der Frequency Counts after globalRecode

require(sdcMicro); data(testdata)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
w='sampling_weight', hhId='ori_hid')
length(table(sdc@manipKeyVars[,"age"])) #OK
p1 <- print(sdc)
sdc <- globalRecode(sdc, column="age",
breaks=c(1,9,19,29,39,49,59,69,100), labels=1:8)
length(table(sdc@manipKeyVars[,"age"])) #OK
p2 <- print(sdc)

identical(p1,p2)

TRUE

Problem with dUtility in microaggregation

activedataset <- testdata
sdcObject <- createSdcObj(activedataset,keyVars=c('urbrur','roof','sex','age'),
  numVars=c('expend','income','savings'),weightVar=c('sampling_weight'),hhId=c('ori_hid'))
sdcObject <- microaggregation(sdcObject,aggr=c( 3 ), method=c('mdav'), variables=c('expend'))
8: stop("undefined columns selected")
7: `[.data.frame`(xm, , i)
6: xm[, i]
5: dUtilityWORK(x = x, xm = xm, method = "IL1", ...)
4: dUtility(obj)
3: dUtility(obj) at microaggregation.R#29
2: microaggregation(sdcObject, aggr = c(3), method = c("mdav"), 
       variables = c("expend")) at microaggregation.R#3
1: microaggregation(sdcObject, aggr = c(3), method = c("mdav"), 
       variables = c("expend"))

remove references to pram_strata in vignette/guidelines

since there is now no function pram_strata(), we have to options to deal with this situation in the docs.

  1. remove all references to pram_strata() with correct code
  2. add a wrapper function pram_strata() that just calls pram()

the same has to be done for references to the "old" pram function, since parameters have been changed.

thoughts?

bug in pram

require(sdcMicro); data(testdata)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
#pramVars=c("walls"),
w='sampling_weight',
hhId='ori_hid')

sdc <- pram(sdc)

error

sdc <- createSdcObj(testdata,
keyVars=c('urbrur','water','sex','age'),
numVars=c('expend','income','savings'),
pramVars=c("walls"),
w='sampling_weight',
hhId='ori_hid')

sdc <- pram(sdc)

error

Documentation on UNDO -> GUI tutorial

UNDO: document somewhere that x undo's are possible for data size of x: >100000 kein UNDO, sonst ist maxUndo auf 1 gesetzt ist aber beliebig änderbar (maybe only in the GUI tutorial)

pram | pram_strata | localSuppression

problem: es koennen keine "nicht-keyVars" als variablen fuer pram verwendet werden. gleichzeitig ist es nicht möglich, localSuppression() nur auf ein subset von "keyVars" auszuführen. Ich denke, folgendes sollte möglich sein:

  • data(free1)
  • sdc <- createSdcObj(free1, keyVars=c("REGION","SEX","AGE","MARSTAT","KINDPERS"),weightVar="WEIGHT")
  • sdc <- localSuppression(sdc, vars=c("REGION","SEX","AGE", k=3)
  • sdc <- pram(sdc, vars=c("MARSTAT","KINDPERS")
  • sdc <- pram(sdc, vars=c("EDUC1")

zum programmieren in sdcMicro scheint es recht einfach zu sein, aber ich bin mir nicht sicher, ob das nicht auswirkungen auf das gui und andere sachen (extractManipData(), get.sdcMicroObj(..., type="manipData")) hat. alex?

Rename ???

in Window "Recode" now "rename selected level" and "group selected levels" instead of "rename" and "group", war das gemeint?

OK and Cancel button ordering

OK and cancel in the same order. in some windows Cancel is left to OK, in
some it is right to OK. -- Suchen, wo!?!?

Confirmation window for removing direct identifiers

  • confirm that the direct identifiers are removed in tab identifiers. Move
    "remove direct identifiers" to variable selection frame! and instead of
    "remove direct identifiers" do this confirmation button. Eigentlich muesste
    man das fuer den Report auch beruecksichtigen....

Problem when loading script

I have installed all required packages for sdcMicroGUI and it seems that it works OK but when i try to select variables to be categorical, numerical, weight etc and when i click OK windows appear and it shows script running but it never finishes. and in the R console appears
Error in measure_riskWORK(manipData, keyVars, w = w, hid = hhId, ...) :
Please define valid key variables
What is the problem?
Thanks

bug in localSupp()

sometimes the individual risks evaluate to NaN. In this case localSupp() threw an error. We can easily work around this issue but we have to look at the problems with the individual risk calculation.

mafast by argument

only when using a sdcMicroObj, otherwise it works.

data(testdata)
sdc <- createSdcObj(testdata,
keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- mafast(sdc,by="urbrur")

mafast finds no keyVars for applying mafast on strata.

warning for data sets >4000

4000 might be too low now with the quiet fast system, maybe set it to 20000?

The new warning text is: Large data sets require extensive computation time, so please be patient.

addNoise ROMM method does not work for me

library(sdcMicro)
[snip]
library(far)
Loading required package: nlme
far library : Modelization for Functional AutoRegressive processes
version 0.6-0 (2005-01-10)
data(Tarragona)
a1 <- addNoise(Tarragona)
a2 <- addNoise(Tarragona,method="ROMM")
[1] "please load package far"

[[it freezes here and says i have not loaded far even though i have]]
[[i just kill the process otherwise it overloads my computer]]

sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] far_0.6-3 nlme_3.1-117 sdcMicro_4.3.0 xtable_1.7-3 data.table_1.9.2
[6] knitr_1.6 brew_1.0-6

loaded via a namespace (and not attached):
[1] car_2.0-20 class_7.3-10 cluster_1.15.2 DEoptimR_1.0-1 e1071_1.6-3
[6] evaluate_0.5.3 formatR_0.10 grid_3.1.0 lattice_0.20-29 MASS_7.3-31
[11] nnet_7.3-8 plyr_1.8.1 Rcpp_0.11.1 reshape2_1.2.2 robustbase_0.91-1
[16] sets_1.0-13 stringr_0.6.2 tools_3.1.0

Error message for sdcMicroGUI when using RStudio

I just installed XQuartz and GTK+ on OS X 10.9.1 but get the following error message:
( I restarted R of course.)

The GTK+ installation screen keeps prompting.

If the package still does not load, please ensure that GTK+ is installed and that it is on your PATH environment variable
IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN
Loading required package: gWidgets
Error : .onAttach failed in attachNamespace() for 'gWidgetsRGtk2', details:
  call: .Call(name, ..., PACKAGE = PACKAGE)
  error: "S_gtk_icon_factory_new" not available for .Call() for package "RGtk2"
In addition: Warning message:
Failed to load RGtk2 dynamic library, attempting to install it. 
Error: package ‘gWidgetsRGtk2’ could not be loaded

Difference between pram and pram_strata?!

ohne das jetzt ueberpruefen zu koennen - ich glaube der print output unterscheidet sich bei den beiden Funktionen, wobei pram() detailiertere Info gibt, zumindest falls nur eine Variable gepramt wird.

Numeric risk computation after microaggregation?

set.seed( 657613 )
activedataset <- testdata
sdcObject <- createSdcObj(activedataset,keyVars=c('urbrur','roof','walls'),numVars=c('income'),weightVar=c('sampling_weight'),hhId=c('ori_hid'))
sdcObject <- microaggregation(sdcObject,aggr=c( 3 ), method=c('mdav'), variables=c('income'))
sdcObject <- microaggregation(sdcObject,aggr=c( 3 ), method=c('mdav'), variables=c('income'),strata_variables=c('urbrur'))
sdcObject@risk$numeric
s=dRisk(sdcObject)
s@risk$numeric

Change of Text: Selection of variables?

after selection of variables the one window which pops up - change text
there: something like "...but not recognized as correct format. This can be
confirmed or changed in the next window." --fixed

problem with localSuppression

due to update of data.table, localSuppressionWORK does not work and thus a test ("tests/reporting_test.R") fails.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.