Comments (13)
Hey @kgoldfeld I have some questions for this issue:
genCorOrdCat already supports this, in the doc it says:
Vector of adjustment variables name in dtName - determines logistic shift. This is specified assuming a cumulative logit link. The vector can be NULL, of length 1, or a length equal to the number of new categorical variables.
Do I see correctly that genOrdCat can only add one var at a time so adjVar should be NULL or one var name? Is there a reason for the different behavior? Or should we adjust it to behave similar to genCorOrdCat?
from simstudy.
Also what is the reason for the use of this check:
Line 515 in bfcecd6
This also flags this call even though hit should work imho:
genOrdCat(genData(1), baseprobs = probs, catVar = "grp")
.If you want to make sure the passed argument is a data.table I would suggest:
library(data.table)
arg <- data.table()
is(arg, "data.table")
#> [1] TRUE
Created on 2020-09-16 by the reprex package (v0.3.0)
In some other functions you also check arguments that do not have a default with if(!missing(arg)) stop("")
afaik this is unnecessary as r will give you an error by itself if an argument without default is missing:
log()
#> Error in eval(expr, envir, enclos): argument "x" is missing, with no default
Created on 2020-09-16 by the reprex package (v0.3.0)
from simstudy.
I definitely agree with your first point - I had never considered that option. So, checking that the passed argument is an existing data.table or will generate a data.table seems good.
As far as the second point, I guess I wanted to have my own customized message - but in this case I failed to actually include a message. But, I agree.
from simstudy.
Regarding genOrdCat - the adjVar should only be NULL or a single variable name. The reason genCorOrdCat can have multiple variable names is that the shift can be different for each of the outcomes generated (thought the outcomes will be correlated). But genOrdCat is generating only a single variable, so only one possible adjustment.
from simstudy.
OK, I see the point of custom error messages, we can use https://github.com/tidyverse/glue and maybe https://github.com/r-lib/cli to make nice looking messages :).
I understand why adjVar should be length 1, my question was rather if we should allow for the creation of multiple cat vars in one go to have the same behavior as genCorOrdCat (sans the correlation obviously). Or do you think this would lower the readability ?
from simstudy.
What you are suggesting is there could be just a single function and the default correlation would be 0, and the user could specify any number of categorical variables, including just a single one. We could phase out genCorOrdCat and fuse it together with genOrdCat. This would give us a test case for deprecating.
from simstudy.
I was not thinking about deprecating one but that sounds good, let's do it!
from simstudy.
I can't imagine it is heavily used function in any case - though who knows?
from simstudy.
I think I have succesfully merged the two functions, I need to write some more test but manual testing looks promising. It is still in genOrdCat2 until i test it properly and make the changes to deprecate genCorOrdCat.
@kgoldfeld please check out the latest change on the branch: https://github.com/kgoldfeld/simstudy/tree/assignUser/issue34
And let me know if it works as expected!
from simstudy.
I've starting checking out the function. My quick check with a single outcome looks OK. However, generating multiple columns doesn't seem to be quite right. It looks like the columns 2 through 5 are merely the same as column 1. Here's an example:
library(simstudy)
baseprobs <- matrix(c(0.2, 0.1, 0.7,
0.7, 0.2, 0.1,
0.5, 0.2, 0.3,
0.4, 0.2, 0.4,
0.6, 0.2, 0.2),
nrow = 5, byrow = TRUE)
# generate the data
seedno <- 1234
set.seed(seedno)
dT <- genData(10000)
dX <- genCorOrdCat(dT, adjVar = NULL, baseprobs = baseprobs,
prefix = "q", rho = 0.15, corstr = "cs")
set.seed(seedno)
dT <- genData(10000)
dX1 <- simstudy:::genOrdCat2(dT, adjVar = NULL, baseprobs = baseprobs,
prefix = "q", rho = 0.15, corstr = "cs")
dX[, table(q1)]
#> q1
#> 1 2 3
#> 2017 972 7011
dX1[, table(q1)]
#> q1
#> 1 2 3
#> 2017 972 7011
dX[, table(q2)]
#> q2
#> 1 2 3
#> 7006 2046 948
dX1[, table(q2)]
#> q2
#> 1 2 3
#> 2017 972 7011
dX[, table(q3)]
#> q3
#> 1 2 3
#> 5046 2001 2953
dX1[, table(q3)]
#> q3
#> 1 2 3
#> 2017 972 7011
from simstudy.
The issue seems to be with turning the columns to factors. When we set asFactor = FALSE
it works as intended:
library(simstudy)
baseprobs <- matrix(c(
0.2, 0.1, 0.7,
0.7, 0.2, 0.1,
0.5, 0.2, 0.3,
0.4, 0.2, 0.4,
0.6, 0.2, 0.2
),
nrow = 5, byrow = TRUE
)
# generate the data
seedno <- 1234
set.seed(seedno)
dT <- genData(10000)
dX <- genCorOrdCat(dT,
adjVar = NULL, baseprobs = baseprobs,
prefix = "q", rho = 0.15, corstr = "cs"
)
set.seed(seedno)
dT <- genData(10000)
dX1 <- simstudy:::genOrdCat2(dT,
adjVar = NULL, baseprobs = baseprobs,
prefix = "q", rho = 0.15, corstr = "cs", asFactor = FALSE
)
dX[, table(q1)]
#> q1
#> 1 2 3
#> 2017 972 7011
dX1[, table(q1)]
#> q1
#> 1 2 3
#> 2017 972 7011
dX[, table(q2)]
#> q2
#> 1 2 3
#> 7006 2046 948
dX1[, table(q2)]
#> q2
#> 1 2 3
#> 7006 2046 948
dX[, table(q3)]
#> q3
#> 1 2 3
#> 5046 2001 2953
dX1[, table(q3)]
#> q3
#> 1 2 3
#> 5046 2001 2953
Created on 2020-09-21 by the reprex package (v0.3.0.9001)
I will check it out.
from simstudy.
Yeah it is caused by genFactor, specifically my assumption that it was vectorized already but it is just duplicating the same data under multiple rows. I will need to rework that to be able to use it as intended.
from simstudy.
This should work properly now...
from simstudy.
Related Issues (20)
- update genMarkov to allow user to specify distribution for starting state HOT 5
- trtAssign return integer or factor HOT 2
- assertIntegerOrFactor
- assertDataTableExists
- update genFormula for use with double dot notation HOT 2
- Best practice for testing probablistic functions (genMarkov testing) HOT 6
- assertEqual
- double dot notation in genMiss testing and more generally HOT 22
- Generating Time Series HOT 1
- Generate synthentic data from an existing data set HOT 1
- Parse distribution moments as inputs HOT 1
- Release simstudy 0.5.0
- Survival analysis Weibull parametrisation HOT 2
- skip_on_cran() conflicting with covr HOT 3
- trtAssign converts all columns to integer HOT 7
- Error checking genSpline generates unwanted warning HOT 1
- Release simstudy 0.5.1
- addCorGen is very inflexible - needs to be updated HOT 1
- Package pbv not found R CMD CHECK on several servers (related to genCorGen) HOT 1
- Allow genCorMat to generate a list of covariance matrices of different sizes ... HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simstudy.