Comments (18)
from simstudy.
Yeah I understand why you did it this way, but we still can check formula formats etc. :)
I will hot fix defData for now to get the new evalDef working and testing. Afterwards I'll look at it and defDataAdd. Maybe we can unify them a bit,
from simstudy.
One idea I have to unify defData/Add would be to add a dt
argument to defData which is used to check for previously defined vars in case it is given. This would of course change the api in a backwards incompatibily way...
Alternativly we could add a flag add = TRUE/FALSE
that disables the previously defined checks.
Let me know if you have any other ideas :)
from simstudy.
Or the add
flag could be either a boolean or a dt, in case it is dt we can check validity but in case of bool we just don't check (as is the behavior of defDataAdd
no).
This would allow for backward compatibility while also allowing for more "security" with adding new vars.
from simstudy.
This is in line with what I had been thinking. In concept, I like the 2nd option you just mentioned. But, as I think about it, one issue I see is users (me, for example), might like to define all the data generating processes before actually generating the data - so that it would be impossible to specify the data set:
# Define data
d1 <- defData(...)
...
d1 <- defData(d1, ...)
d2 <- defDataAdd(...)
...
d2 <- defDataAdd(d2, ...)
# Generate data
dd <- genData(1000, d1)
dd <- addColumns(d2, dd)
from simstudy.
Maybe im just a bit dumb right now but why would you need two definitions in that case?
Addtionaly the "dual" flag would allow for just setting it to TRUE in that case.
Also would deprecate defDataAdd I guess.
from simstudy.
Not dumb - it was just a highly simplified example - so my fault. Typically, it would look more like this, where there is some outcome variable that is a function of the treatment assignment. (Another typical example would be where we've created longitudinal data from a data set and want to add new data that are dependent on some time variable):
library(simstudy)
library(ggplot2)
# Define data
d1 <- defData(varname = "x", formula = 0, variance = 1)
d2 <- defDataAdd(varname = "y", formula = "5 + 0.5 * x + 2 * rx", variance = 2)
# Generate data
dd <- genData(1000, d1)
dd <- trtAssign(dd, grpName = "rx")
dd <- addColumns(d2, dd)
# Look at data and regression line
ggplot(data = dd, aes(x = x, y = y, group = rx)) +
geom_point()+
geom_smooth(aes(color = factor(rx)), method = "lm")
from simstudy.
If we can't unify them, that is fine. My primary goal is to make the data generation process super clear, and if that comes at the expense of efficiency, I am definitely willing to sacrifice efficiency.
from simstudy.
Ah that makes sense. I mean we could make defDataAdd use the same code as defData without changeing the external behaviour if you were thinking about that.
from simstudy.
Would there be an advantage to that?
from simstudy.
For the user? no, but it sounds like you are happy with the way it work at the moment? For us it would reduce the amount of code to maintain.
from simstudy.
So, if it would be easier to maintain - it makes sense to do it.
from simstudy.
There was also the idea from you at some point to add a "previous Dataset" parameter to defData. is this still something you are interested in?
from simstudy.
I am thinking to put this on hold - I think I talked myself out of making any changes to this.
from simstudy.
Ok, maybe we should than make clear via warning that the id parameter will be ignored if a data definition is supplied (the current behaviour).
from simstudy.
That sounds totally reasonable.
from simstudy.
I am thinking to put this on hold - I think I talked myself out of making any changes to this.
Any new insights on this? Or do you still want to keep the api as is and (if any changes) unify internally?
from simstudy.
No - I have no new insights on this. Though your suggestion in #50 for a flag for the copy option raised a possibility here to add a flag to defData to indicate that this is adding a new columns to an existing table (which would mean we could get away from defDataAdd. It is basically the same thing as having a separate function, but makes explicit that they are the same thing, except that the formula checking if the add
flag is TRUE would be be different.
I am not sure this is any better, just throwing it out there.
from simstudy.
Related Issues (20)
- Parse distribution moments as inputs HOT 1
- Release simstudy 0.5.0
- Survival analysis Weibull parametrisation HOT 2
- skip_on_cran() conflicting with covr HOT 3
- trtAssign converts all columns to integer HOT 7
- Error checking genSpline generates unwanted warning HOT 1
- Release simstudy 0.5.1
- addCorGen is very inflexible - needs to be updated HOT 1
- Package pbv not found R CMD CHECK on several servers (related to genCorGen) HOT 1
- Allow genCorMat to generate a list of covariance matrices of different sizes ... HOT 1
- genCorGen can accomodate Normal distribution, but gives an error HOT 1
- Should we combine genCorData and genCorGen?
- Change assertPositiveDefinite to assertPositiveSemiDefinite
- Release simstudy 0.6.0 HOT 2
- add double-dot functionality for defSurv
- Generate unbalanced cluster sizes HOT 1
- Modify survParamPlot to allow x-axis limits HOT 1
- double dot notation not working properly in genSurv HOT 1
- Release simstudy 0.7.0
- Generating large data sets is slower than I thought HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simstudy.