connordonegan / stan-iar Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 1.0 6.65 MB

Spatial intrinsic autoregressive models in Stan

License: Other

Stan 56.33% R 36.23% TeX 7.43%

stan-iar's People

Contributors

Stargazers

Watchers

Forkers

shayeste60

stan-iar's Issues

separate intercept and scale per connected component

Implement remaining recommendations from Freni-Sterrantino et al.'s "A note on intrinsic conditional autoregressive models for
disconnected graphs", Spatial and Spatio-Temporal Epidemiology:

give each connected component its own intercept---but avoiding assigning an intercept to islands;
for singletons/islands, replace standard normal prior with normal(0, tau) where tau is the ICAR scale parameter;
add option to scale the ICAR and BYM models (not only the BYM2 specification).

Also, could give each connected component its own scale by default since they're practically separate models, and it seems to improve sampling.

use function to compute per-component ICARs

I recently coded up a version of this model which is very similar to this one - submitting here for your consideration.
putting the ICAR normal into a function makes the model easier to read.
I'm not sure if these two models encode component membership in exactly the same way, but the computation is the same.

functions {
  /**
   * Return the log probability density of the specified vector of
   * coefficients under the ICAR model with unit variance, where
   * adjacency is determined by the adjacency array and the spatial
   * structure is a disconnected graph which has at least one
   * connected component.  The spatial structure is described by
   * an array of component sizes and a corresponding 2-D array
   * where each row contains the indices of the nodes in that
   * component.  The adjacency array contains two parallel arrays
   * of adjacent element indexes (i.e. edges in the component graph).
   *
   * For example, a series of four coefficients phi[1:4] for a
   * disconnected graph containing 1 singleton would have
   * adjacency array {{1, 2}, {2, 3}}, signaling that coefficient 1
   * is adjacent to coefficient 2, and 2 is adjacent to 3,
   * component size array {3, 1}, and (zero-padded) component members
   * array of arrays { { 1, 2, 3, 0}, {4, 0, 0, 0} }.
   *
   * Each connected component has a soft sum-to-zero constraint.
   * Singleton components don't contribute to the ICAR model.
   *
   * @param phi vector of varying effects
   * @param adjacency parallel arrays of indexes of adjacent elements of phi
   * @param comp_size array of sizes of components in spatial structure graph
   * @param comp_members array of arrays of per-components coefficients.
   *
   * @return ICAR log probability density
   *
   * @reject if the the adjacency matrix does not have two rows
   * @reject if size mismatch between comp_size and comp_members
   * @reject if size mismatch between phi and dimension 2 of comp_members
   */
  real standard_icar_disconnected_lpdf(vector phi,
				       int[ , ] adjacency,
				       int[ ] comp_size,
				       int[ , ] comp_members) {
    if (size(adjacency) != 2)
      reject("require 2 rows for adjacency array;",
             " found rows = ", size(adjacency));
    if (size(comp_size) != size(comp_members))
      reject("require ", size(comp_size), " rows for members matrix;",
             " found rows = ", size(comp_members));
    if (size(phi) != dims(comp_members)[2])
      reject("require ", size(phi), " columns for members matrix;",
             " found columns = ", dims(comp_members)[2]);

    real total = 0;
    for (n in 1:size(comp_size)) {
      if (comp_size[n] > 1)
	total += -0.5 * dot_self(phi[adjacency[1, comp_members[n, 1:comp_size[n]]]] -
				 phi[adjacency[2, comp_members[n, 1:comp_size[n]]]])
	  + normal_lpdf(sum(phi[comp_members[n, 1:comp_size[n]]]) | 0, 0.001 * comp_size[n]);
    }
    return total;
  }
}
data {
  // spatial structure
  int<lower = 0> I;  // number of nodes
  int<lower = 0> J;  // number of edges
  int<lower = 1, upper = I> edges[2, J];  // node[1, j] adjacent to node[2, j]

  int<lower=0, upper=I> K;  // number of components in spatial graph
  int<lower=0, upper=K> K_size[K];   // component sizes
  int<lower=0, upper=I> K_members[K, I];  // rows contain per-component areal region indices

  vector<lower=0>[K] tau_sp; // per-component scaling factor, 0 for singletons
}
parameters {
  // spatial effects
  real<lower = 0> sigma_sp;  // scale of spatial effects
  real<lower = 0, upper = 1> rho_sp;  // proportion of spatial effect that's spatially smoothed
  vector[I] theta_sp;  // standardized heterogeneous spatial effects
  vector[I] phi_sp;  // standardized spatially smoothed spatial effects

}
transformed parameters {
  vector[I] gamma;
  // each component has its own spatial smoothing
  for (k in 1:K) {
    if (K_size[k] == 1) {
      gamma[K_members[k,1]] = theta_sp[K_members[k,1]];
    } else {
      gamma[K_members[k, 1:K_size[k]]] = 
	    (sqrt(1 - rho_sp) * theta_sp[K_members[k, 1:K_size[k]]]
	     + (sqrt(rho_sp) * sqrt(1 / tau_sp[k])
		* phi_sp[K_members[k, 1:K_size[k]]]) * sigma_sp);
    }
  }
}
model {
  // spatial hyperpriors and priors
  sigma_sp ~ normal(0, 1);
  rho_sp ~ beta(0.5, 0.5);
  theta_sp ~ normal(0, 1);
  phi_sp ~ standard_icar_disconnected(edges, K_size, K_members);
}

BYM2 island implementation flawed; fix and writeup

The BYM2 islands implementation isn't doing the right thing.
I have an implementation which seems to be working here:
https://github.com/stan-dev/example-models/tree/master/knitr/car-iar-poisson/update_2021_02

Jupyter notebook, still in progress.

further testing is required, in particular:

run the data through INLA for all 3 Scotland graphs and compare with Stan results
recreate the plots from the Freni-Sterrantino et al 2017 paper and compare

your review and comments welcome. happy to move this over to this repo so that folks can easily find it.

the pain points for the BYM2 model are:

converting GIS data from spatial polygons to R spdep's package nb objects to data arrays used in the Stan programs
computing the scaling factor

I checked a bunch of R scripts and data files into GitHub; computing the scaling factor requires magic from R package INLA. It would be nice to have corresponding Python helpers, using numpy, or SciPy and whatever GIS packages exist. not sure if they do - only INLA has the magic foo to compute the geometric mean of a sparse matrix.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.