seeg-oxford / movement Goto Github PK
View Code? Open in Web Editor NEWR package containing useful functions for the analysis of movement data in disease modelling and mapping
R package containing useful functions for the analysis of movement data in disease modelling and mapping
Running the example for movement
, the step predicting form a flux model issues the following warning:
predictedMovement <- predict(originalRadiation(theta = 0.1), locationData, symmetric = TRUE)
Warning message:
In as.movement_matrix.matrix(prediction$prediction) :
The given movement_matix contains non-integer values. Rounding was used to return a valid movement_matrix object.
Whilst we want integer predictions in this cases, in general, predictions from a flux
or movement_model
object should not be coerced to integer. The following should therefore work and not issue any warnings (note changes to object names):
data(kenya)
kenya10 <- raster::aggregate(kenya, 10, sum)
net <- getNetwork(kenya10, min = 50000)
location_data <- data.frame(location = net$locations, population = net$population, x = net$coordinate[,1], y = net$coordinate[,2])
class(location_data) <- c("location_dataframe", "data.frame")
# simulate movements (note the values of movementmatrix must be integer)
predicted_movement <- predict(originalRadiation(theta = 0.1), location_data, symmetric = TRUE)
movement_matrix <- as.movement_matrix(round(predicted_movement$movement_matrix))
# fit a new model to these data
movement_model <- movement(movement_matrix ~ location_data, radiationWithSelection(theta = 0.5))
# print movement_model
print(movement_model)
# predict the population movements
predicted_movements <- predict(movement_model, kenya10)
# display the predicted movements
showprediction(predicted_movements)
In print.summary.movement_model
we (should, #108) report the estimated coefficients, and the estimated standard errors. Whilst parameters are reported on the 'true' scale, as in the equations, the standard errors are correspond to parameters estimates on the continuous scale we optimise on, and it only makes sense to report these.
We need to add something to the docs for summary.movement_model
explaining this, and edit the printed summary to report the parameters on the trues scale, on the transformed scale, and the standard errors on the transformed scale.
It looks like we have the settings of the individual git's different configured and working on different OS (Windows vs Mac) causing various merging conflicts due to line endings.
Reading a bit how to best address this issue, I came across the suggestions to add a '.gitattributes' file to the repos to ensure consistent handling of line endings (see https://help.github.com/articles/dealing-with-line-endings/). And here, https://github.com/Danimoth/gitattributes, is a selection of
'gitattributes' templates for various languages incl. R which we might can use as a starting point.
@goldingn Lets have a call today to discuss.
The object returned by the movement
function (which is undergoing renaming so I won't be more specific) should have a plot method.
PR #60 provides such functionality, given vectors of distances, observed movements and predicted movements
When creating the movement_matrix ensure that the column & row names are meaningful in respect of their location.
We could then also check that these line up with the 'location_dataframe' locations during model fitting which would be safer.
We can issue a warning or an error if something is coerced into a movement_matrix without row and column names.
Using the version of the movement example code here, the following does not plot predictions in a sensible way:
predicted_movements2 <- predict(movement_model, location_data)
class(predicted_movements2)
showprediction(predicted_movements2)
The following code in show.prediction
should be sufficient in this case (unneccessary raster version commented out), so some handling of the input datatype to select the plot method should be all that is needed:
# sp::plot(raster_layer, ...)
size <- 0.1 + 2 * network$population/max(network$population)
# points(network$coordinates, pch = 16, cex = size, col = rgb(0, 0, 1, 0.6))
plot(network$coordinates, pch = 16, cex = size, col = rgb(0, 0, 1, 0.6), asp = 1)
n <- nrow(network$coordinates)
for (i in 2:n) {
for (j in (i - 1):n) {
arrows(network$coordinates[i, 1],
network$coordinates[i, 2],
network$coordinates[j, 1],
network$coordinates[j, 2],
lwd = 4,
length = 0,
col = rgb(0, 0, 1, predictedMovements[i, j] / (max(predictedMovements) + 0)))
}
}
note also that the current first line:
sp::plot(raster::raster(network$distance_matrix))
should not be in there!
I'd like to submit the package to CRAN before the end of next week. They'll only accept it passes R CMD check --as-cran
on their machines on the previous, stable and development versions of R with no warnings, errors or significant* notes. Currently, Travis reports two notes and two warnings on R-devel. We'll need to fix these before submission. I've pasted the current R-devel problems below and suggested fixes. We'll also need to eyeball the outputs of the other Travis runs for issues, do a test run locally which builds the manual, and run checks on Windows.
*significance is often arbitrarily determined
* checking dependencies in R code ... WARNING
'::' or ':::' import not declared from: ‘viridis’
Adding #' importFrom viridis viridis
to the package's roxygen blurb
* checking R code for possible problems ... NOTE
densityCompare: no visible global function definition for ‘density’
densityCompare: no visible global function definition for ‘grey’
densityCompare: no visible global function definition for ‘lines’
distanceDistPlot: no visible global function definition for ‘grey’
plotComparePredictions: no visible global function definition for ‘cor’
plotComparePredictions: no visible global function definition for
‘title’
scatter: no visible global function definition for ‘smoothScatter’
scatter: no visible global function definition for ‘grey’
unitTransform: no visible global function definition for ‘plogis’
unitTransform: no visible global function definition for ‘qlogis’
Undefined global functions or variables:
cor density grey lines plogis qlogis smoothScatter title
Consider adding
importFrom("graphics", "lines", "smoothScatter", "title")
importFrom("grDevices", "grey")
importFrom("stats", "cor", "density", "plogis", "qlogis")
to your NAMESPACE.
Add the suggested imports as #' importFrom x y
in the package roxygen docs (if written in NAMESPACE directly they'll be overwritten)
* checking Rd line widths ... NOTE
Rd file 'movement.Rd':
\examples lines wider than 100 characters:
locationData <- data.frame(location = net$locations, population = net$population, x = net$coordinate[,1], y = net$coordinate[,2])
Rd file 'plot.movement_predictions.Rd':
\examples lines wider than 100 characters:
locationData <- data.frame(location = net$locations, population = net$population, x = net$coordinate[,1], y = net$coordinate[,2])
Rd file 'predict.movement_model.Rd':
\examples lines wider than 100 characters:
locationData <- data.frame(location = net$locations, population = net$population, x = net$coordinate[,1], y = net$coordinate[,2])
These lines will be truncated in the PDF manual.
Add some newlines in these examples
* checking Rd cross-references ... WARNING
Missing link or links in documentation object 'travelTime.Rd':
‘[gdistance]{adjacent}’
See section 'Cross-references' in the 'Writing R Extensions' manual.
I think that just needs to be gDistance
There are also examples which take >5s to run (bad from CRAN's perspective, though weirdly not a warning):
* checking examples ... OK
Examples with CPU or elapsed time > 5s
user system elapsed
movement 106.159 0.155 106.553
plot.movement_predictions 102.005 0.111 102.178
predict.movement_model 97.064 0.099 97.268
kenya 5.396 0.102 5.502
These can be fixed quickly by adding \dontrun{}
around the examples in the roxygen blurb. See here for a working example.
Some of the tests explicitly called predict.*()
functions. Now that they're properly registered as S3 methods these won't work.
For some cases it was just a case of switching to predict()
, but for tests which use with_mock
they'll need to be re-written. I've commented them out for now to pass tests, in bf106ed
The link in the movement() method for the predict() method opens the help for the associates predict() methods in the packages 'Spatial model predictions' and 'Model Predictions', but not the relevant predict.movementmodel method
Do we need to implement a
predict <- function(object,...){
UseMethod("....")
}
similar to the showprediction() method? Or is there another way how to correctly link to the right predict() method?
Same would apply for the 'predict.optimisedmodel' method.
Make sure that the docs, parameter names, flux functions, and references all line up and that all parameters are explained in the docs
Alex to do
The old example under the movement() function is not useful anymore. Add a new example which applies the movement() function directly.
Given the new interface, I think we should define a class movementpredictions for the output of the predict method for 'optimisedmodel' objects (the output of class to movement).
This could then be the predict function for the movementpredictions class, with plotting of the underlying raster removed. I.e. only the coordinates of the locations in the object 'newdatapassed topredict.optimisedmodel`, and the lines between them.
The output of movement (essentially an optimised ‘flux' object) should be of class ‘movement_model’.
Include option to calculate great circle (or orthodromic) distances between location points.
The get.network.fromdataframe() function is used within the 'predict.movementmodel' and therefore within the various predict functions of the package.
Currently, for this function to work, the data.frame has to have the following column names: 'pop_origin' (for population), 'origin' (for locations), 'long_origin' (for long = x) and 'lat_origin' (for lat = y). If this function is currently called with the locationdataframe object (having columns 'location', 'population, 'x' and 'y'), this function will generate an error.
Either we need to clarify this restriction in the documentation, or we want to rewrite the code to accept a locationdataframe object instead? The latter required careful checking of the code where the change of the column names can have knock-on effects.
@goldingn : what do you think?
fitting a movement model (at least one using radiationWithSelection()
) to an observed connectivity matrix with 0s on the diagonal causes the optimiser to fail on the starting parameters, presumably because the radiation model evaluated on distance 0 is nonsense and return a non-finite value.
movement
should catch this and handle it nicely
The confusingly named function movementmodel
(which returns an object of class movementmodel
) uses a named model to make predictions to a RasterLayer
object, given a named flux model and object. This function is exported to the user and used in various documentation examples.
The main model-fitting function movement
(which returns an object of class optimisedmodel
) already handles prediction to raster data (by calling predict.movementmodel
).
We should unexport the function movementmodel
and functions which operate on the class movementmodel
and make sure all of its functionality is incorporated in movement. This will likely break some things and require changes to the existing examples.
R CMD check --as-cran
issues the following warning
* checking S3 generic/method consistency ... WARNING
Warning: declared S3 method 'predict.default' not found
as.locationdataframe:
function(input, ...)
as.locationdataframe.SpatialPolygonsDataFrame:
function(input, populationraster)
summary:
function(object, ...)
summary.optimisedmodel:
function(x)
predict:
function(object, ...)
predict.movementmodel:
function(predictionModel, dataframe, ...)
predict:
function(object, ...)
predict.optimisedmodel:
function(predictionModel, dataframe, ...)
predict:
function(object, ...)
predict.movementmodel:
function(predictionModel, dataframe, ...)
predict:
function(object, ...)
predict.optimisedmodel:
function(predictionModel, dataframe, ...)
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
I'm not sure what the expected set up should be for S3 classes, but apparently this isn't it...
could you please also add a AIC.movement_model
method if you have time. There's an existing AIC function, so just the method needed
Currently the user has to format their locations data into pairs of to-from locations, just so that as.location_dataframe.dataframe
can flatten them back to one location per row.
Instead, as.location_dataframe.dataframe
should take a correctly formatted dataframe with location IDs, populations and x and y coordinates, and just apply the class label. The docs should be updated accordingly
Currently loads of functions are exported which are unlikely to ever be needed by the user. We need to identify the core functions and then hide some of the ancillary functions.
There are equations, descriptions and references for most of these flux functions here (and code here)
These should be added to the flux function docs for the corresponding models, later to be moved into the docs for flux
objects, as described in #39.
These should also list the constraints on each parameter, using the constraints info e.g. here
The sequential for loop in movement.predict
https://github.com/SEEG-Oxford/movement/blob/master/R/movement.R#L1478-L1511 should be parallelisable.
We should:
1) expose a parallel
argument to the user so that they can choose whether to run prediction in parallel:
predict(obj, parallel = TRUE)
This can be passed up through all functions that make movement predictions (movement
, predict.prediction_model
, predict.flux
, predict.movement_model
, fittingwrapper
& attemptoptimisation
)
2) Provide a simple utility function to set up the parallel backend using doSnow
if they want parallelism:
parallel(n = 20)
3) Make sure a progress bar is still provided. See here for how to include progress bars in a foreach parallel implementation: http://www.r-bloggers.com/updates-to-the-foreach-package-and-its-friends/
Here's a function that works for 2) (with three new importFrom
calls required):
parallel <- function(n = NULL) {
if (is.null(n)) {
n <- parallel::detectCores()
}
cl <- snow::makeSOCKcluster(n)
doSNOW::registerDoSNOW(cl)
message(sprintf('parallel backend registered on %i cores', n))
}
It's lopsided (only calculates i to j, not both or the average) because I wrote the code after a beer or two...
I'll add an option to either calculate the two-way movements (non-symmetric movement matrix) or one-way average movements (symmetric movement matrix). The latter is what I'll be sticking into statistical models and should be a bit faster to calculate, but contains less information.
On loading the package (Andrew's fork at least), I get this note:
The following object is masked from ‘package:raster’:
predict
The following object is masked from ‘package:stats’:
predict
it would be better not to mask these generics, perhaps predict
should not be exported (fine to export predict.movementmodel
etc.)
To write out the full equation with a dirac's delta function, and to provide an explanation of the mapping from [0, 1]
to [0, max(distance)]
Include option to use rectangular origin-destination matrices in intervening opportunities model.
For many applications movement
fails because the objective can't be calculated at the starting parameters, causing optim
to error:
Error in optim(predictionModel$modelparams, fittingwrapper, method = "Nelder-Mead", :
function cannot be evaluated at initial parameters
Some error handling here would be helpful, and more intelligent parameter transformations for the optimiser.
When running the documented example using the Kenya model:
# get location data
data(kenya)
kenya10 <- raster::aggregate(kenya, 10, sum)
net <- getNetwork(kenya10, min = 50000)
location_data <- data.frame(location = net$locations, population = net$population, x = net$coordinate[,1], y = net$coordinate[,2])
location_data <- as.location_dataframe(location_data)
# simulate movements (note the values of movementmatrix must be integer)
predicted_flux <- predict(originalRadiation(theta = 0.1), location_data, symmetric = TRUE)
movement_matrix <- predicted_flux$movement_matrix
# fit a new model to these data
movement_model <- movement(movement_matrix ~ location_data, radiationWithSelection(theta = 0.5))
the summary of the calculated model summary(movement_model )
causes an error due to the method stderrors <- sqrt(abs(diag(solve(object$optimisation_results$hessian))))
.
The hessian results for this example is:
$hessian
[,1] [,2]
[1,] 4300876 0
[2,] 0 0
and the error message printed is
Error in solve.default(movement_model$optimisation_results$hessian) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0
@goldingn : How best do you want to handle such a case? Perhaps we can add a tryCatch
statement for the line stderrors <- sqrt(abs(diag(solve(object$optimisation_results$hessian))))
and if an error occurs, set the variable stderrors
to NA
and print a warning to the user instead of causing a failure of the function?
While rewriting the movement() method, I found a redundant code section / potential bug when calling the optimization method optim().
In line 120/121 of movement.R in the movement() function body there are two different calls to the attemptoptimisation() method whereas one is comment out:
L120: #optimresults <- attemptoptimisation(predictionModel, locationdataframe, movement_matrix, progress=FALSE, hessian=TRUE, upper=upper, lower=lower, ...) #, upper=upper, lower=lower
L121: optimresults <- attemptoptimisation(predictionModel, population_data, movement_matrix, progress=FALSE, hessian=TRUE, ...) #, upper=upper, lower=lower
Using the call in line 121 (which does NOT assigns the upper and lower arguments) results in the scenario that the optim() method will use the default values for lower (=-Inf) and upper (=Inf) [see documentation of optim()] and NOT the upper / lower bounds set in the line 59..103 in the movement() method depending on the model selected. This option was in the movement.R package as I started the project.
Using the call in line 120 (which assigned the upper and lower argument to the attemptoptimisation() method) will result that the upper and lower values, as set in line 59..103 in the movement() method depending on the model selected, will be passed over to the optim() method. This will, however, cause an error, as the bounds can ONLY be used for the optimization methods = "L-BFGS-B" or "Bent" - but we use the 'BFGS" method.
Thus, if the "BFGS" method should be use for the optimization, the 'upper' and 'lower' parameters are redundant and the code should be cleaned accordingly (including the relevant tests). Furthermore, the movement() method MUST use the call
optimresults <- attemptoptimisation(predictionModel, population_data, movement_matrix, progress=FALSE, hessian=TRUE, ...)
and the other comment-out line should be removed to avoid future confusion.
If however, the upper and lower bounds should be used by the optimization model, then the code contains a bug and needs fixing.
currently the main model-fitting function is movement
. This requires many pieces of data from the user:
movement(locations, coords, population, movement_matrix, model, model.params = NULL, ...)
with required arguments:
locations
(unclear what this needs to be)coords
of coordinatespopulation
of populationsmovement_matrix
of observed movements between locationsmodel
naming a model to useAnd optional arguments:
model.params
giving parameters corresponding to model....
A simpler interface would be:
movement(movement_matrix, location_dataframe, model = gravity(), optim_options = list())
possibly with glm
-like function interface:
movement(movement_matrix ~ location_dataframe)
with required arguments:
movement_matrix
being an object of class movement_matrix
containing the observed movement between some or all locations in location_dataframe
location_dataframe
being an object of class location_dataframe
containing all of the information in locations
, coords
and population
as dataframe columns, each row being a different location.and optional parameters
model
being a function to initialise a model of the named type. E.g. gravity()
would return a list giving the name of the flux function (in this case gravity.flux
) and default arguments such as starting parameters as a named numeric vector (in this case c(alpha = 1, beta1 = 0.6, beta2 = 0.3, gamma = 3)
). This model initialiser and could take different arguments, e.g. gravity(params = c(0.5, 0.2, 0.1, 5))
.optim_options
being a named list of options to pass to optim
Existing functions could be modified to help users create the location_dataframe
and movement_matrix
objects, the *.flux
functions could be unexported and their docs replaced with more model- (rather than code-) centric docs for these initialisation functions.
Finally, we could write generic print
, plot
, and summary
methods for the objects returned by the initialisation functions (flux
objects?) and the same models with optimised parameters. e.g.:
flux <- gravity(c(0.5, 0.2, 0.1, 5))
plot(flux)
and
m <- movement(obs ~ locs)
plot(m$flux)
Thoughts on this proposed interface?
Currently the movementmodel
and predict.movementmodel
functions enables users to construct theoretical (i.e. not fitted to data) movement models and make predictions from them to RasterLayer
objects. This will be removed to solve issue #54.
predict.optimisedmodel
enables users to predict from fitted movement models to RasterLayer
s, and dataframes.
We should define a predict
method for flux
objects to make predictions to dataframes and RasterLayers
from theoretical models.
The syntax would then be:
fl1 <- gravity()
predict(fl1, some_raster)
fl2 <- radiation.with.selection()
predict(fl2, some_dataframe)
the summary method for a movement_model reports no parameter estimates:
devtools::install_github('SEEG-Oxford/movement')
library(movement)
# get location data
data(kenya)
kenya10 <- raster::aggregate(kenya, 10, sum)
net <- getNetwork(kenya10, min = 50000)
location_data <- data.frame(location = net$locations, population = net$population, x = net$coordinate[,1], y = net$coordinate[,2])
location_data <- as.location_dataframe(location_data)
# simulate movements (note the values of movementmatrix must be integer)
predicted_flux <- predict(originalRadiation(theta = 0.1), location_data, symmetric = TRUE)
movement_matrix <- predicted_flux$movement_matrix
# fit a new model to these data
movement_model <- movement(movement_matrix ~ location_data, originalRadiation())
summary(movement_model)
Model:
Deviance Residuals: 1
No coefficients
Null Deviance: 19356153.9810511 on 18906 degrees of freedom
Residual Deviance: 1191290.81130209 on 18905 degrees of freedom
AIC: 1191292.81130209
it also doesn't report the model type
e.g. the input argument of as.locationdataframe.data.frame
(ugh) reads:
input A data.frame of the format # origin destination movement pop_origin
pop_destination # 1 a b 10 100 88 # 2 a c 8 100 100 # 3 a d 10 100 113 # 4 a e 11 100 107
#5 a f 8 100 67 # lat_origin long_origin lat_destination long_destination # 0.07826932 0.13612404
0.12114115 0.58984725 # 0.07826932 0.13612404 0.07126503 0.19544754 # 0.07826932
0.13612404 0.97817937 0.22771625 # 0.07826932 0.13612404 0.87233335 0.06695538 #
0.07826932 0.13612404 0.23157835 0.19573021
I thing \tabular
is what's needed here:
https://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables
Possibly with their examples - this for Nick
this will be more intuitive for the user
checking as CRAN results in these warnings:
* checking S3 generic/method consistency ... WARNING
print:
function(x, ...)
print.flux:
function(object)
summary:
function(object, ...)
summary.flux:
function(object)
predict:
function(object, ...)
predict.flux:
function(object, locationdataframe, min_network_pop, symmetric)
See section ‘Generic functions and methods’ in the ‘Writing R
Extensions’ manual.
This is because there should be a dots (...
) argument for all of these S3 methods. You can copy the argument documentation for this argument from those for ?print
, ?summary
and ?predict
.
@KathrinTessella I think print and summary methods are already provided for these, since they inherit from class matrix
and dataframe
respectively. Could you please check though and make sure this is true?
I will come up with some plotting code for these objects for use in the plot methods and let you know when they're available (later this week hopefully)
net <- getNetwork(nt10, min = 1)
location_data <- data.frame(location = net$locations,
population = net$population,
x = net$coordinate[,1],
y = net$coordinate[,2])
location_data <- as.location_dataframe(location_data)
predicted_flux <- predict(radiationWithSelection(theta = 0.5), location_data, symmetric = TRUE)
movement_matrix <- round(predicted_flux$movement_matrix)
movement_model <- movement(movement_matrix ~ location_data, radiationWithSelection(theta = 0.5))
print(movement_model)
predicted_movements <- predict(movement_model, nt10)
I get the following error:
Error in txtProgressBar(min = 1, max = nrow(indices), style = 3) :
must have 'max' > 'min'
I am not sure what this error means and how to fix it?
In the following we will use the following naming convention:
function names using camelCase notation
arguments, objects and class names using snake_case (underscore for separating)
In general, the 'dot' notation should be avoided.
Other elements of style should follow the Google style guide: https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
except that function documentation should go above functions, since they’re roxygen docs.
Hello, thanks for your nice package. When using the package to predict the human movement, I came across a question and hope you can give me some advices.
For each location, we have obtained the top 100 human flow among locations, rather than the human flow among all the regions. Therefore, the movement matrix is not a full matrix, there was some missing value in the matrix. For instance, we gathered the population flow between A and B as well as B and C, but we don't know the exact human flow between A and C. If we remove the location with the missing value, the number of regions remaining in the movement matrix is quite small. In this situation, how can I use the movement package to predict the human flow between A and C?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.