xfim / ggmcmc Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 31.0 2.37 MB

Graphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference

R 91.67% TeX 8.33%

bayesian-data-analysis ggplot2 graphical jags mcmc r stan

ggmcmc's People

Contributors

Stargazers

Watchers

ggmcmc's Issues

Make "ggs" understand "brmsfit" objects (feature suggestion)

Hi!

It would be wonderful if ggs would understand brmsfit objects from the package brms: Bayesian Regression Models using Stan. There's a method as.mcmc.brmsfit that turns a brmsfit object into a list of MCMC samples ("Extract posterior samples for use with the coda package").

Thanks for considering this!

Make "ggs" understand "stanreg" objects in addition to "stanfit" (feature suggestion)

It would be great if ggs could import "stanreg" objects generated by the rstanarm R package in addition to the generic "stanfit" objects. Thanks!

Add model comparison to ggs_separation()

As in ggs_caterpillar(), add the possibility to allow model comparison in ggs_separation().

Description needs BugReports and URL fields

That point to this repo

ggs_Rhat gives different answers than gelman.diag

Hi! This is a great package, but I just had a question about the ggs_Rhat function. It seems to be giving different estimates than the gelman.diag function from the coda package. I'm not sure if they are different, but I thought I'd put this out there to see if this is either an explanation, or to learn about the difference between the two functions. Here's a reproducible example:

library(coda)
data(radon)
radon.long <- radon$s.radon
g <- gelman.diag(radon.long, autoburnin = F, multivariate = F)$psrf
r <- ggs_Rhat(ggs(radon.long))$data

g[rownames(g) == "sigma.beta",]
r[r$Parameter == "sigma.beta",]

You can see that the gelman.diag estimate gives a point estimate of 1.22 for this parameter, while the ggs_Rhat function gives a point estimate of 1.09.

Fix: error in ggs_autocorrelation

Calling ggs_autocorrelation with a number of lags larger than the number of iterations actually present produces an error. Commit be37820, appended to pull-request #21 , fixes this problem, generates a warning and plot until the max possible lag instead.

Cannot include the warmup using ggs for a stanfit object

Hi there

When I run the following command, I cant seem to get out the warmup samples..

But I guess this is the idea with inc_warmup = T?

> ggs(STAN_fit, family = 'sigma', inc_warmup = T)
Source: local data frame [1,000 x 4]

   Iteration Chain          Parameter    value
1          1     1 transactions_sigma 208765.6
2          2     1 transactions_sigma 206573.2
3          3     1 transactions_sigma 198310.6
4          4     1 transactions_sigma 214327.9
5          5     1 transactions_sigma 209830.3
6          6     1 transactions_sigma 205860.9
7          7     1 transactions_sigma 199150.3
8          8     1 transactions_sigma 209000.2
9          9     1 transactions_sigma 200834.9
10        10     1 transactions_sigma 217648.9
..       ...   ...                ...      ...
Warning message:
In ggs(STAN_fit, family = "sigma", inc_warmup = T) :
  inc_warmup must be 'FALSE', so it is ignored.

Add the possibility to do a fully Bayesian version of the separation plot

This option is not currently implemented. It can be either using facets (problematic) or intensities of colors. But in any case it is problematic.

ggs_Rhat error with one chain (makes ggmcmc fail)

If you have only one chain, ggs_Rhat produces an error.

If you call ggs_Rhat individually, this is a minor problem, even if an warning "At least two chains..." would be nicer.

However, when you run ggmcmc, it stops processing and does not cleanly close the pdf file.

Take burn-in period into account

It would be nice if the gggmcmc-package (I am using version ggmcmc_0.2) would take a burn-in period into account. Let's assume a burn-in period of 10,000, then coda's traceplot() command starts at iteration 10,000, whereas ggs_traceplot() starts at iteration 0. Thanks, Bernd

ggs_crosscorrelation() gives a warning "Non Lab interpolation is deprecated"

The issue comes from https://github.com/hadley/scales/blob/master/R/pal-gradient.r

Error using ggs()

Hi Xavier,

I'm writing an R package that uses your ggmcmc routine to create MCMC output plots from JAGS. It's been working very nicely until today, when I made some changes. Now when I call ggs() on my mcmc.list object, I get the following error:

> ggmcmc(ggs(jags1.mcmc),file="diagnostics.pdf")
Error in eval(expr, envir, enclos) : object 'Parameter' not found

I've tried supplying ggs() with par_labels from varnames(jags1.mcmc), but that didn't help. Any idea what the problem could be? I can send my R workspace if that would help.

Thanks,
Brian

Suggestion: Make parallel = FALSE by default

Since setting up parallel is quite a bit of additional work (especially on Windows), I suggest to make parallel=FALSE by default to avoid many warnings (or suppress the warnings)

Error with 0-var check in ggs_geweke from sde0f

Hello,

I just submitted a pull request with a fix, but posting an issue as well. When using your ggmcmc package to look at diagnostics for a JAGS model, I encountered an error at line 40 of ggs_geweke:

> D.geweke <- D.geweke %>%
+     dplyr::group_by (Parameter, Chain, part) %>%
+     dplyr::summarize(m=mean(value), sde0f=sde0f(value), n=n())
Error in summarise_impl(.data, dots) : 
  Evaluation error: zero-variance series.

Traced this to the check for 0 variance in sde0f and believe my pull request fixes the issue: #63

Problem with dev tidyr

checking examples ... ERROR
Running examples in ‘ggmcmc-Ex.R’ failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: ggmcmc
> ### Title: Wrapper function that creates a single pdf file with all plots
> ###   that ggmcmc can produce.
> ### Aliases: ggmcmc ggmcmc-package
> 
> ### ** Examples
> 
> data(linear)
> ggmcmc(ggs(s))  # Directly from a coda object
Plotting histograms
Error in factor(x, levels = 1:n, labels = labels) : 
  invalid 'labels'; length 0 should be 1 or 3
Calls: ggmcmc ... print -> ggs_histogram -> cbind -> gl_unq -> factor
Execution halted

Could you please check to see if this is a new bug in tidyr?

How was the binary dataset created?

The documentation says

in order to illustrate the functions suitable for binary outcomes a new dataset must be simulated, and samples of parameters from a simple logistic regression model must be obtained

But it does not explain how this dataset was derived or created. An actual example would be useful. I have a logistic regression model and I would like to plot a roc-curve. I would like to understand how to simulate and create a dataset suitable for creating a roc-curve.

Create visual representations of formal diagnostics

Create plots showing the output of formal diagnostics such as:

Geweke
Gelman &Rubin
Heidelberger-Welch
Raftery-Lewis

Add new diagnostic plots (Heidelberger-Welch, Raftery-Lewis)

Allow custom facet_wrap layout

Hello,

Big fan of the package so far. Could you change the facet_wrap in all the plotting from nrow = 1 to the default or as an optional setting? If I want to plot more than 5 parameters they quickly run off the edge of the plots so a grid layout would be helpful.

Thanks

Kieran

ggmcmc: allow NULL for file name

I frequently try to append other things to the pdf. I suggest that NULL can be passed as a filename, in which case pdf and dev.off() are skipped. This does not break old code, and is 2 additional lines.

Choosing layout when plotting multiple parameters

My plotting panel in Rstudio is quite small, rendering the graphs hard to read. See below. It would be very convenient to either (1) have an ncol argument to choose number of columns or (2) have an faceting argument which defaults to TRUE but could be set to FALSE so that one could do a manual +facet_wrap(~Parameter, ncol=3) or, say, a +facet_grid(chain~Parameter). I like solution 2 better.

ggmcmc is really great! I just discovered it a week ago and I am thrilled to get MCMC into the safe home of ggplot2!

ggs_caterpillar ignores thick_ci and thin_ci arguments

Calling ggs_caterpillar with values for thick_ci and thin_ci doesn't change the plot.

When I call ggs_caterpillar(D) and ggs_caterpillar(D, thick_ci = c(0.4,0.6), I get identical plots. Using ggmcmc_0.7.3 from CRAN.

I get the same result when I install the github verson with devtools::install_github('xfim/ggmcmc')

stat_bin warning

Plotting histogram gives many ugly infos:

stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

I know these are not critical, but it would be better if they were suppressed by providing a fixed binwidth.

stan and ggs

The documentation says that ggs should work with stan output, but it does not.
Quite close to the top there is

S <- as.array(S)

which will always fail for the complex stanfit output. Since no examples or test cases are provided (would be nice....) this goes unnoticed.

HPD intervals in density plots and traceplots

It took me some time to figure out how to add HPD intervals to density plots. I think it would be great to have an argument like HPD=FALSE to ggs_density (horizontal) and ggs_traceplot (vertical).

Here you go for density:

# Functions to get HPD
hpd_lower = function(x, prob=0.95) HPDinterval(as.mcmc(x), prob=prob)[1]
hpd_upper = function(x, prob=0.95) HPDinterval(as.mcmc(x), prob=prob)[2]

# Make data for geom_segment to use
hpd_data = dplyr::group_by(my_ggs_data, Parameter) %>% 
  dplyr::summarize(x=hpd_lower(value), xend=hpd_upper(value))

# Density and HPD
ggplot(plot1_ggs, aes(x=value)) + 
  geom_density(aes(fill='gray', y=..scaled..)) + scale_fill_manual(values = "gray") +
  geom_segment(data=hpd_data, aes(x=x, xend=xend, y=0.01, yend=0.01), size=3) +

Replace reshape with reshape2

ggmcmc uses package reshape. This has been deprecated years ago, and was replaced by reshape2.

Solve R CMD check in R-devel

Unexported object imported by a ‘:::’ call: ggplot2:::bin’

Otherwise in next R release, R CMD check will not pass.

ggp_ppsd and ggs_ppmean give error message

data(samples)
ggs_ppsd(ggs(S, family="y"), outcome="y")

Error in ggs_ppsd(ggs(d.stan, family = "beta"), outcome = "beta") :
The length of the outcome must be equal to the number of Parameters of the ggs object.

Version: 0.4.1

ggs_caterpillarplot must be able to plot bars against continuous variables

ggs_caterpillarplot accepts being plotted against the name of the variable, but must be able to be printed against an externally specified value.

Import burnin and thinning meta-information from stanfit and csv methods.

When importing stanfit objects and csv files with stan output the metadata on the burnin period and the thinning does not pass correctly.

setting greek=T mixes up labels

When I call ggs_caterpillar with greek=T, the parsed labels appear on the chart in alphabetical order, not in the order of the medians of the distributions, so the bars on the chart get the wrong labels.

Calling ggs_caterpillar(g) and ggs_caterpillar(g, greek=T) the bars appear in the same places, but the labels appear in different order.

g.zip

Disable/change sorting in ggs_catarpillar

I make inference on a variable where there is a meaningful order to the indices. I would like to keep that order in the plot. Could you implement either (1) a method to disable value-sorting and keep the alphabetical sorting? or (2) an option to manually specify a sorting.

Here is the plot. In this case, I would just like them in the order of the index to see if discriminability covaries with item difficulty:

ci fails when a list of dataframes is passed

The manual for ci implies a list of data.frames can be passed to the function and confidence intervals calculated for each model (that https://github.com/xfim/ggmcmc/blob/master/R/functions.R#L95).

When I try this I get:

Error in UseMethod("group_by_") : 
  no applicable method for 'group_by_' applied to an object of class "list"

A glance at the code shows that group_by is run on the input to the function - this will always fail if a list is passed.

problem with ggs()

Hi, I'm having problems trying to make the ggs() function select a "family", with ggs() returning "Error in S[[1]] : this S4 class is not subsettable".
This error does not occur when family=NA, and also does not occur for mcmc.list objects (confirmed using the radon dataset).
I'm using R 3.5.3, the newest ggmcmc and rstan package (downloaded today) under a windows 10 64 bit environment.

Below is a reproducible example using the 8school example from the Rstan "getting started" page.

// saved as test.stan
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
real mu; // population treatment effect
real<lower=0> tau; // standard deviation in treatment effects
vector[J] eta; // unscaled deviation from mu by school
}
transformed parameters {
vector[J] theta = mu + tau * eta; // school treatment effects
}
model {
target += normal_lpdf(eta | 0, 1); // prior log-density
target += normal_lpdf(y | theta, sigma); // log-likelihood
}

// data
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))

// stan and ggmcmc code
library(rstan)
fit <- stan(file = 'test.stan', data = schools_dat)

library(ggmcmc)
ggs(fit, family="mu")

Add a new plot for comparison of percent correctly predicted (PCP) on binary outcomes

multiple models for caterplot

I'd like to add the ability to pass a list of models (in ggs format) to the caterplot function and then use facet_grid to split up the plots horizontally or vertically. Are you interested in having this functionality in the package and if so would you prefer I have it as a separate function or integrated into the original caterplot function?

Add model comparison to ggs_rocplot()

As in ggs_caterpillar(), add the possibility to allow model comparison in ggs_rocplot().

Show Greek letters as parameter names.

As suggested by @BadWizard in #45 , instead of showing "alpha" or the names of the parameters, show the corresponding Greek letters.

Add example on how to use family in the R help files.

Suggested by Thomas Daeubler:

Add an example of the use of the double escape to the R help file (under family argument).

R CMD check fails with dev version of tidyr

checking whether package ‘ggmcmc’ can be installed ... WARNING
Found the following significant warnings:
  Warning: replacing previous import by ‘tidyr::%>%’ when loading ‘ggmcmc’
See ‘/private/tmp/Rtmp5nYH93/check_cranb673158d7feb/ggmcmc.Rcheck/00install.out’ for details.

probably because you're doing too many unqualified imports

ggmcmc() can produce only one type of plots

Right now ggmcmc() produces a pdf file with all the types of plots. It would be possible to make it produce only one type of plots, and still take advantage of the division in several pages.

Suggested by dmenne by email.

Add HPD interval to ggs_density

In ggs_density, it would be nice to have a thick line along the x-axis indicating the X% HPD interval. Kruschke's BEST package does this as do many others. For consistency, maybe use the thick-and-thin ggs_catarpillar style.

ggmcmc is great, thank you!

separate the Rhat code from ggs_rhat

Hi,

Thanks for your package, which I'm making very heavy use of! I wonder if you'd be open to separating out the rhat calculation from ggs_rhat into a separate function? Happy to push some code doing that if it makes it easier.

A big chunk of what I use it for is simply the ggs() conversion of mcmc to a data frame, which I then conduct lots of post processing of. As part of that post-processing, I often find myself wanting to calculate the Rhat, and while ggs_rhat() does that for me, it's a bit round-about to try and pull the values from the ggplot object. I've ended up simply cutting the relevant rhat calculation code out of the code body, and using that as a stand-alone function instead.

Improve importing data from other MCMC packages

Improve the facilities to convert MCMCpack, rstan or output from other R packages into a data frame that ggmcmc can process

Keeping order of parameters

Hi, I have a feature request. I would like to keep the order of parameters the same as it is from the output of the mcmc.list. I mostly use runjags package, so the summary() output of a runjags object will typically be in the order I set it in the monitor argument of run.jags().

E.g. The plots from the runjags model mod <- run.jags(..., monitor = c("gamma", "beta" )) will sort the parameters in alphabetical order, so it will appear as beta[1], beta[2],..., gamma[1], gamma[2],... etc.

I understand there is the sort=FALSE option in ggs() function, but this doesn't seem to be doing what I'd like it to.

One quick and lazy way that I found to work was to edit the lines in ggs.R where it says D$Parameter <- factor(D$Parameter) to D$Parameter <- factor(D$Parameter, levels = unique(D$Parameter)), in particular under the if statement for sort. This then allows me to run ggs() function with sort = FALSE, and it will give me plots exactly in the order as it appears from the runjags output.

The only thing is I don't know if this breaks the sort = FALSE functionality that you originally intended. I can't seem to figure out what this does?

Hope I was clear. Thanks for a great package!

library(ggmcmc)
stanmodelcode <- "
data {
int<lower=0> N;
real y[N];
} 
parameters {
real mu;
} 
model {
mu ~ normal(0, 10);
y ~ normal(mu, 1); 
} 
"

y <- rnorm(20) 
dat <- list(N = 20, y = y); 
fit <- stan(model_code = stanmodelcode, data = dat, chains = 3)
#### Another bug in ggmcmc: this dependency is missing. 
# Things should work without the next line
library(coda) # Bit this one gives an warning which should be avoided by using namespaces
s = ggs(fit)
#Error in 1:n.samples : argument of length 0

xfim / ggmcmc Goto Github PK

ggmcmc's People

Contributors

Stargazers

Watchers

Forkers

ggmcmc's Issues

Recommend Projects

Recommend Topics

Recommend Org