squidgroup / squidsim Goto Github PK
View Code? Open in Web Editor NEWsquidSim in tool for simulating data from multi-level/hierarchical models, including genetic, phylogenetic, temporal and spatial effects
License: Other
squidSim in tool for simulating data from multi-level/hierarchical models, including genetic, phylogenetic, temporal and spatial effects
License: Other
Need to work out how best to allow a user to simulate phylogenetic effects or breeding values, and perhaps more broadly give a correlation matrix, to allow for spatial effects for example.
work out how to put temporal effects in
This would be based on the simulated phenotype, and via the three different processes MCAR, MAR and MNAR.
The example on this page: http://squidgroup.org/squidSim_vignette/2.3-simulating-predictors-at-different-hierarchical-levels.html
produces the warning:
Warning message:
In length(x) == 1 && x == "" :
'length(x) = 1000 > 1' in coercion to 'logical(1)'.
potential bug - if something is input as a fixed factor (fixed =TRUE in parameter list), the user can give names. If these were the same names as in the data_structure, there is no reference back to them in the code, so it would be easy for a user to put them in the wrong way round and the function wouldn't detect this, so the effects would be the wrong way in the generated data.
Also the default naming should be better. At the moment, it comes out as e.g. sex_effect1, sex_effect2. It would be much better if it linked back the names in the data_structure, so was sex_male and sex_female for example
I think this can be done through indexing in the model formula, which is already implemented in sim_population
Good for specifying, but shouldn't be in the names in the resulting data.frame - year:month should end up as something like year_month
at the moment the names of the levels of each factor are just 1:n. Would be nice if we could add names
Here some thoughts on naming parameters:
The population id in the output dataset is named squid_pop
. None of the other parameters is preceded by squid_
. May be have it just pop
or population
.
All parameter names are in lower case except N
and N_pop
. What about n
and n_pop
?
is there an easy way to to generate code for common model types. E.g random regression is a little complex, but maybe could have a function taht would generate random regression code structure?
I'm trying to sample 25 individuals from simulated data of 100 individuals. However, the number of individuals sampled is different in each simulation.
library(squidSim)
sim_res <- simulate_population(
data_structure = make_structure("individual(100)", repeat_obs=1),
response_name = "Female",
parameters = list(
intercept = 0.19360,
observation = list(
names = c("Elo_score"),
beta = c(-0.11368)
),
residual = list(
vcov = 6.95e-10
)
),
family = "binomial",
link = "logit",
n_pop = 1,
sample_type = "nested",
sample_param = cbind(individual=25, observation=1)
)
sample_data <- get_sample_data(sim_res, sample_set=1)
Think the best way to do this would be to specify different sampling regimes, and then have a way of combining them, by specifying which variables have which sampling regime.
Could do this in get_sample_data()
, something along the lines of giving a list
list(1=c("rain"), 2=c("y"))
which would indicate that the first sampling regime was used for "rain" and the second for "y". This would then enable different response variables to be sampled differently, for example
list(1=c("y1"), 2=c("y2"))
It would also be a good idea to allow the sampled data to be returned with NAs in it, for example, have an argument include_NA=TRUE (with FALSE as default)
know predictors need to be matrix or data frame. Names of variables need to be inputed manually, because they are lost if inputed as vectors.
When simulating predictors, squidSim allows to specify the covariance matrix between all these predictors. squidSim also allows to input the user data. However, the covariance between simulated and know predictors is not possible. This could be a very useful feature.
I think it will be nice to have a function that summarizes the expected patterns of variance and covariance at the different levels due to known and unknown sources. As well as the proportion of (co)variance explained by each process. This is something that is implemented in the squidApp
If extra variables are created and saved, these will also be transformed if family/link are specified
would be useful to have a function that allows minor updates to parameter list, rather than complete respecification, e.g.
update(group, parameter, values)
naming something in the parameter list with covariate=TRUE doesn't seem to lead affect what it is called in predictor matrix
Below the first equation, but also in the second chunk, Z and E are missing their indices so that the current documentation says that two matrices both follow multivariate normal distributions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.