Giter Club home page Giter Club logo

raker's Issues

Add repopulate() function?

Sometimes the zone populations for different constraint tables don't match. For example, the population in zone a might be 4 for one variable, but the population in zone b might be 5 in a different variable.

This is common with census tables where anonymisation means that some people might be 'swapped'.

More often than not multiple variables match and one or two do not, so it's obvious which population is correct. The incorrect populations are recalculated (imputed) from the actual population.

Should I be creating a repopulate() function to handle this?

Error with `rake()`

rake() isn't behaving as expected:

Error: df[["code"]] and row.names(result) are not equal:
  Lengths (3, 0) differ (string compare on first 0)

Peer review of rakeR

Re: ropensci/software-review#251 (comment)

Hi @Robinlovelace
Would love to have your feedback on the package. I want the package to be a high level implementation of microsimulation (microsimulation for the masses!) that abstracts the underlying process as much as possible from the user. At the moment you simply pass in individual level data and constraint data and it does the rest. If @virgesmith 's neworder can fit in that paradigm, and if @virgesmith is happy for it to be included, I'd love to expand the algorithms available to rakeR.

Where do you think we should go from here? I have a bit of tidying up to do in the package (tests need revising, for example). Perhaps I should get those complete over the next few weeks and then we can look at getting it reviewed?

Good to hear from you, anyway!

RNG seed hard-coded

integerise function sets RNG seed to hard-coded value every time its called, so always gives same result.
suggest the seed is passed as an optional argument, with default value same as current

Handle colnames (cons)/levels (inds) and order

Currently the order of the columns in cons and columns in inds is crucial and must match AND be alphabetical because of the way inds is expanded (spread).

Need to find a way around this that doesn't require providing levels and colnames in perfect alphabetical order because this is not always desirable.

Check all `cons` populations match

In rk_weight():

# The sum of weights will form the simulated population so this must match
  # the population from cons
  if (!isTRUE(all.equal(sum(weights), (sum(cons) / length(vars))))) {
    stop("Weight populations don't match constraint populations.
          Usually this means the populations for each of your constraints
          are slightly different\n",
         "Sum of simulated population:  ", sum(weights), "\n",
         "Sum of constraint population: ", (sum(cons) / length(vars)))
  }

Refactor/remove checks in functions

Checks in the middle or at the end of the functions should be refactored or removed, or moved to tests/. Currently they're hard to test so code coverage isn't complete.
Example, rk_weight():

# The sum of weights will form the simulated population so this must match
  # the population from cons
  if (!isTRUE(all.equal(sum(weights), (sum(cons) / length(vars))))) {
    stop("Weight populations don't match constraint populations.
          Usually this means the populations for each of your constraints
          are slightly different\n",
         "Sum of simulated population:  ", sum(weights), "\n",
         "Sum of constraint population: ", (sum(cons) / length(vars)))
  }

Add validation functions?

Internal and external validation steps that could be added:

  • Correlation (vector and total)
  • t-test (2-sided, equal variance) of sim/actual vectors
  • TAE and SAE (vector and total)
  • SEI
  • Percentage error

Test returns warning that rake() is deprecated

Testing rakeR
✔ | OK F W S | Context
✔ |  2       | Test check_*() return deprecated
✔ |  5   1   | Check deprecated functions [0.1 s]
────────────────────────────────────────────────────────────────────────────────
test_deprecated.R:49: warning: rake() errors
'rake' is deprecated.
Use 'rk_rake' instead.
See help("Deprecated")
────────────────────────────────────────────────────────────────────────────────

prevent extract() working with numeric variables

extract creates a variable for each unique level of each variable in in the input individual-level data set (inds). If one of these variables is numeric it will almost certainly have a huge number of unique levels, especially for double().

Add a test to extract() so that it won't continue and output if it finds any numeric variables in the inds data set.

Error when running rk_rake

When running the rk_rake command, I'm getting this error :

Error in if (!sum(weights_dec%%1) > 0) { : missing value where TRUE/FALSE needed

The arguments con, inds, vars look correct.

Any idea what could be causing this? First time I've used the package.

Thank

Fails when input weights already integers

sample function errors because all probabilities are zero. (weights - weights_int)
integerise should simply return the input weights in this case.
Enclosing this call in an if statement fixes problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.