Giter Club home page Giter Club logo

mglmriem's Introduction

MGLMRiem: Multivariate Generalized Linear Models on Riemann Manifolds

Implementing in R (as well as augmenting and modifying) the algorithms of Kim et al. 2014 for regressing multiple symmetric positive definite matrices against real valued covariates. The original code from which this repo is based was written in Matlab, and is available from NITRC: mglm_riem. As well, the surrounding works are hosted on the University of Wisconsin website: Hyunwoo J. Kim (2014).

This R package is intended to simplify the application of regression for response variables living on the manifold of symmetric positive definite matrices, with real valued covariates.

How to Install

You will need to install from github:

library(devtools)
devtools::install_github("mrparker909/MGLMRiem")

How to Use

Generate Some Data

We will use the support package genDataMGLMRiem to generate some example data to work with.

# devtools::install_github("mrparker909/genDataMGLMRiem")
library(genDataMGLMRiem)

# set seed for reproducibility
set.seed(12345)

# 1) Generate confound data
# Confound 1: 10 observations from a binomial distribution, with probability 0.25, and maximum size 2
C1 = genDataMGLMRiem::SimConfoundData(n = 10, D = rbinom, size = 2, prob = 0.25)

# Confound 2: 10 observations from a standard normal distribution (mean 0, variance 1)
C2 = genDataMGLMRiem::SimConfoundData(n = 10, D = rnorm)

# Put them together to form our X observations:
X = rbind(C1, C2)

# 2) Generate dependent SPD data
# generate 10 3x3 SPD matrices with a covariate effect size between 1 and 2.
simData = genDataMGLMRiem::genSPDdata(N = 10, dims = 3, C = X, minDist = 1, maxDist = 2)

Fit a MGLM Model

library(MGLMRiem)
mod1 = mglm_spd(X = simData$X, Y = simData$Y, pKarcher = T)

Model Diagnostics

We should check that the model converged! If it did not, we would either increase the maximum iterations via maxiters, or implement checkpointing.

# Did the model converge?
mod1$converged
## [1] TRUE

Since the model converged, we can look at the R squared (variance explained within the SPD manifold):

# calculate the Rsquared on the manifold
calc_Rsqr_spd(Y = simData$Y, Yhat = mod1$Yhat)
## [1] 0.7797107

We can also look at the sums of squares:

Ybar = karcher_mean_spd(simData$Y, niter=200)
SSE=SSE_spd(simData$Y, mod1$Yhat)
SST=SST_spd(simData$Y, Ybar)
SSR=SSR_spd(mod1$Yhat, Ybar)

So we can easily calculate SSE=9.5953538, SSR= 33.8877806, and SST=43.5579652.

And from those we can get at the proportions of explained and unexplained variance:

  • Explained Variance = SSR/SST = 0.7779927
  • Unexplained Variance = SSE/SST = 0.2202893
  • 'Manifold' Variance = SSM/SST = 1 - SSE/SST - SSR/SST = 0.001718

Note that the 'Manifold' variance is error due to the manifold structure of the response, and it is implicitly included in the R squared calculation: R^2 = 1-SSE/SST = (SSR+SSM)/SST = 0.7797107.

An alternative, more reasonable approach, would be to consider SSM as unexplained variance, in which case you could define R^2 as R^2 = SSR/SST = 0.7779927. This alternate definition of R^2 is, in particular, more reliable when R^2 is very small.

Checkpointing

It can be difficult to estimate the number of iterations necessary for the model to converge, and it is very wasteful in terms of computing time to run an algorithm for say 1000 iterations, only to find that 1000 was not enough, then to repeat for larger and larger iterations until finding an appropriate number. It can also be infeasible to run an algorithm long enough in one sitting (for example if a system needs to be rebooted intermittently). Whatever the reason, if you need to be able to stop the algorithm, and restart it from where you left off, you can enable checkpointing very easily:

# Here we reduce the number of iterations to 100, which is not
# large enough for the algorithm to converge
mod2 = mglm_spd(X = simData$X, Y = simData$Y, pKarcher = T, 
                maxiter = 100, enableCheckpoint = T)
## [1] "CHECKPOINT ENABLED: setting checkpoint file to C:/Statistics/MGLMRiem./checkpoint.rda"

The model has not converged:

mod2$converged
## [1] FALSE

However, the checkpoint.rda file has been written, and can be used to load the checkpoint into R:

load("checkpoint.rda")

Now the checkpoint object is loaded into R, and can be used as a starting point for the mglm_spd algorithm:

mod3 = mglm_spd_checkpoint(checkpoint)
## [1] "writing checkpoint file to C:/Statistics/MGLMRiem./checkpoint.rda"

Now the model has converged:

mod3$converged
## [1] TRUE

Disclaimer

This R package is under development, and bugs can be expected as well as sudden changes to function call formats, function return values, and general package structure. Use at your own risk. Feel free to contact me through github if you have any questions or concerns!

References

Kim, H. J., Adluru, N., Collins, M. D., Chung, M. K., Bendin, B. B., Johnson, S. C., โ€ฆ Singh, V. (2014). Multivariate General Linear Models (MGLM) on Riemannian Manifolds with Applications to Statistical Analysis of Diffusion Weighted Images. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2705โ€“2712. https://doi.org/10.1109/CVPR.2014.352

mglmriem's People

Contributors

mrparker909 avatar

Stargazers

 avatar

Forkers

zhangzjjjjjj

mglmriem's Issues

Potential convergence issue

There is some evidence that mglm_spd is not converging adequately (potentially step size reduces too quickly and the algorithm gets stuck?)

Example: run mglm_spd with pKarcher=T and pKarcher=F, and the results will be slightly different, but more different than tol should allow.

Need to debug.

Convergence

allow users to specify tolerance for delta E and delta gnorm, so that convergence conditions can be easily set

Tutorial

Write tutorial indicating how to use the package, first finish this issue: #1

Documentation

Need to create proper documentation, with example code

mglm_spd

Convergence is de-synced between checkpoint and non-checkpoint mglm_spd function.

mglm_spd()

When X is a vector with length = dim(Y)[3], should automatically convert X to an array

calc_Rsqr_spd()

When SSR is very small, returns a negative value. Should probably replace 1-SSE/SST with SSR/SST. Need to investigate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.