xfim / ggmcmc Goto Github PK
View Code? Open in Web Editor NEWGraphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference
Graphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference
Hi!
It would be wonderful if ggs
would understand brmsfit
objects from the package brms: Bayesian Regression Models using Stan. There's a method as.mcmc.brmsfit
that turns a brmsfit
object into a list of MCMC samples ("Extract posterior samples for use with the coda package").
Thanks for considering this!
It would be great if ggs
could import "stanreg" objects generated by the rstanarm R package in addition to the generic "stanfit" objects. Thanks!
As in ggs_caterpillar(), add the possibility to allow model comparison in ggs_separation().
That point to this repo
Hi! This is a great package, but I just had a question about the ggs_Rhat
function. It seems to be giving different estimates than the gelman.diag
function from the coda package. I'm not sure if they are different, but I thought I'd put this out there to see if this is either an explanation, or to learn about the difference between the two functions. Here's a reproducible example:
library(coda)
data(radon)
radon.long <- radon$s.radon
g <- gelman.diag(radon.long, autoburnin = F, multivariate = F)$psrf
r <- ggs_Rhat(ggs(radon.long))$data
g[rownames(g) == "sigma.beta",]
r[r$Parameter == "sigma.beta",]
You can see that the gelman.diag
estimate gives a point estimate of 1.22 for this parameter, while the ggs_Rhat
function gives a point estimate of 1.09.
Hi there
When I run the following command, I cant seem to get out the warmup samples..
But I guess this is the idea with inc_warmup = T?
> ggs(STAN_fit, family = 'sigma', inc_warmup = T)
Source: local data frame [1,000 x 4]
Iteration Chain Parameter value
1 1 1 transactions_sigma 208765.6
2 2 1 transactions_sigma 206573.2
3 3 1 transactions_sigma 198310.6
4 4 1 transactions_sigma 214327.9
5 5 1 transactions_sigma 209830.3
6 6 1 transactions_sigma 205860.9
7 7 1 transactions_sigma 199150.3
8 8 1 transactions_sigma 209000.2
9 9 1 transactions_sigma 200834.9
10 10 1 transactions_sigma 217648.9
.. ... ... ... ...
Warning message:
In ggs(STAN_fit, family = "sigma", inc_warmup = T) :
inc_warmup must be 'FALSE', so it is ignored.
This option is not currently implemented. It can be either using facets (problematic) or intensities of colors. But in any case it is problematic.
If you have only one chain, ggs_Rhat
produces an error.
If you call ggs_Rhat
individually, this is a minor problem, even if an warning "At least two chains..." would be nicer.
However, when you run ggmcmc
, it stops processing and does not cleanly close the pdf file.
It would be nice if the gggmcmc-package (I am using version ggmcmc_0.2) would take a burn-in period into account. Let's assume a burn-in period of 10,000, then coda's traceplot() command starts at iteration 10,000, whereas ggs_traceplot() starts at iteration 0. Thanks, Bernd
The issue comes from https://github.com/hadley/scales/blob/master/R/pal-gradient.r
Hi Xavier,
I'm writing an R package that uses your ggmcmc
routine to create MCMC output plots from JAGS. It's been working very nicely until today, when I made some changes. Now when I call ggs()
on my mcmc.list
object, I get the following error:
> ggmcmc(ggs(jags1.mcmc),file="diagnostics.pdf")
Error in eval(expr, envir, enclos) : object 'Parameter' not found
I've tried supplying ggs()
with par_labels
from varnames(jags1.mcmc)
, but that didn't help. Any idea what the problem could be? I can send my R workspace if that would help.
Thanks,
Brian
Since setting up parallel is quite a bit of additional work (especially on Windows), I suggest to make parallel=FALSE by default to avoid many warnings (or suppress the warnings)
Hello,
I just submitted a pull request with a fix, but posting an issue as well. When using your ggmcmc
package to look at diagnostics for a JAGS model, I encountered an error at line 40 of ggs_geweke
:
> D.geweke <- D.geweke %>%
+ dplyr::group_by (Parameter, Chain, part) %>%
+ dplyr::summarize(m=mean(value), sde0f=sde0f(value), n=n())
Error in summarise_impl(.data, dots) :
Evaluation error: zero-variance series.
Traced this to the check for 0 variance in sde0f
and believe my pull request fixes the issue: #63
checking examples ... ERROR
Running examples in ‘ggmcmc-Ex.R’ failed
The error most likely occurred in:
> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: ggmcmc
> ### Title: Wrapper function that creates a single pdf file with all plots
> ### that ggmcmc can produce.
> ### Aliases: ggmcmc ggmcmc-package
>
> ### ** Examples
>
> data(linear)
> ggmcmc(ggs(s)) # Directly from a coda object
Plotting histograms
Error in factor(x, levels = 1:n, labels = labels) :
invalid 'labels'; length 0 should be 1 or 3
Calls: ggmcmc ... print -> ggs_histogram -> cbind -> gl_unq -> factor
Execution halted
Could you please check to see if this is a new bug in tidyr?
The documentation says
in order to illustrate the functions suitable for binary outcomes a new dataset must be simulated, and samples of parameters from a simple logistic regression model must be obtained
But it does not explain how this dataset was derived or created. An actual example would be useful. I have a logistic regression model and I would like to plot a roc-curve. I would like to understand how to simulate and create a dataset suitable for creating a roc-curve.
Create plots showing the output of formal diagnostics such as:
Hello,
Big fan of the package so far. Could you change the facet_wrap in all the plotting from nrow = 1 to the default or as an optional setting? If I want to plot more than 5 parameters they quickly run off the edge of the plots so a grid layout would be helpful.
Thanks
Kieran
I frequently try to append other things to the pdf. I suggest that NULL can be passed as a filename, in which case pdf and dev.off() are skipped. This does not break old code, and is 2 additional lines.
My plotting panel in Rstudio is quite small, rendering the graphs hard to read. See below. It would be very convenient to either (1) have an ncol
argument to choose number of columns or (2) have an faceting
argument which defaults to TRUE
but could be set to FALSE
so that one could do a manual +facet_wrap(~Parameter, ncol=3)
or, say, a +facet_grid(chain~Parameter)
. I like solution 2 better.
ggmcmc is really great! I just discovered it a week ago and I am thrilled to get MCMC into the safe home of ggplot2!
Calling ggs_caterpillar with values for thick_ci and thin_ci doesn't change the plot.
When I call ggs_caterpillar(D)
and ggs_caterpillar(D, thick_ci = c(0.4,0.6)
, I get identical plots. Using ggmcmc_0.7.3 from CRAN.
I get the same result when I install the github verson with devtools::install_github('xfim/ggmcmc')
Plotting histogram gives many ugly infos:
stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
I know these are not critical, but it would be better if they were suppressed by providing a fixed binwidth.
The documentation says that ggs
should work with stan
output, but it does not.
Quite close to the top there is
S <- as.array(S)
which will always fail for the complex stanfit output. Since no examples or test cases are provided (would be nice....) this goes unnoticed.
It took me some time to figure out how to add HPD intervals to density plots. I think it would be great to have an argument like HPD=FALSE
to ggs_density
(horizontal) and ggs_traceplot
(vertical).
Here you go for density:
# Functions to get HPD
hpd_lower = function(x, prob=0.95) HPDinterval(as.mcmc(x), prob=prob)[1]
hpd_upper = function(x, prob=0.95) HPDinterval(as.mcmc(x), prob=prob)[2]
# Make data for geom_segment to use
hpd_data = dplyr::group_by(my_ggs_data, Parameter) %>%
dplyr::summarize(x=hpd_lower(value), xend=hpd_upper(value))
# Density and HPD
ggplot(plot1_ggs, aes(x=value)) +
geom_density(aes(fill='gray', y=..scaled..)) + scale_fill_manual(values = "gray") +
geom_segment(data=hpd_data, aes(x=x, xend=xend, y=0.01, yend=0.01), size=3) +
ggmcmc uses package reshape. This has been deprecated years ago, and was replaced by reshape2.
Unexported object imported by a ‘:::’ call: ggplot2:::bin’
Otherwise in next R release, R CMD check will not pass.
data(samples)
ggs_ppsd(ggs(S, family="y"), outcome="y")
Error in ggs_ppsd(ggs(d.stan, family = "beta"), outcome = "beta") :
The length of the outcome must be equal to the number of Parameters of the ggs object.
Version: 0.4.1
ggs_caterpillarplot accepts being plotted against the name of the variable, but must be able to be printed against an externally specified value.
When importing stanfit objects and csv files with stan output the metadata on the burnin period and the thinning does not pass correctly.
When I call ggs_caterpillar
with greek=T
, the parsed labels appear on the chart in alphabetical order, not in the order of the medians of the distributions, so the bars on the chart get the wrong labels.
Calling ggs_caterpillar(g)
and ggs_caterpillar(g, greek=T)
the bars appear in the same places, but the labels appear in different order.
I make inference on a variable where there is a meaningful order to the indices. I would like to keep that order in the plot. Could you implement either (1) a method to disable value-sorting and keep the alphabetical sorting? or (2) an option to manually specify a sorting.
Here is the plot. In this case, I would just like them in the order of the index to see if discriminability covaries with item difficulty:
The manual for ci
implies a list of data.frame
s can be passed to the function and confidence intervals calculated for each model (that https://github.com/xfim/ggmcmc/blob/master/R/functions.R#L95).
When I try this I get:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "list"
A glance at the code shows that group_by
is run on the input to the function - this will always fail if a list is passed.
Hi, I'm having problems trying to make the ggs() function select a "family", with ggs() returning "Error in S[[1]] : this S4 class is not subsettable".
This error does not occur when family=NA, and also does not occur for mcmc.list objects (confirmed using the radon dataset).
I'm using R 3.5.3, the newest ggmcmc and rstan package (downloaded today) under a windows 10 64 bit environment.
Below is a reproducible example using the 8school example from the Rstan "getting started" page.
// saved as test.stan
data {
int<lower=0> J; // number of schools
real y[J]; // estimated treatment effects
real<lower=0> sigma[J]; // standard error of effect estimates
}
parameters {
real mu; // population treatment effect
real<lower=0> tau; // standard deviation in treatment effects
vector[J] eta; // unscaled deviation from mu by school
}
transformed parameters {
vector[J] theta = mu + tau * eta; // school treatment effects
}
model {
target += normal_lpdf(eta | 0, 1); // prior log-density
target += normal_lpdf(y | theta, sigma); // log-likelihood
}
// data
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
// stan and ggmcmc code
library(rstan)
fit <- stan(file = 'test.stan', data = schools_dat)
library(ggmcmc)
ggs(fit, family="mu")
I'd like to add the ability to pass a list of models (in ggs format) to the caterplot function and then use facet_grid to split up the plots horizontally or vertically. Are you interested in having this functionality in the package and if so would you prefer I have it as a separate function or integrated into the original caterplot function?
As in ggs_caterpillar(), add the possibility to allow model comparison in ggs_rocplot().
As suggested by @BadWizard in #45 , instead of showing "alpha" or the names of the parameters, show the corresponding Greek letters.
Suggested by Thomas Daeubler:
Add an example of the use of the double escape to the R help file (under family argument).
checking whether package ‘ggmcmc’ can be installed ... WARNING
Found the following significant warnings:
Warning: replacing previous import by ‘tidyr::%>%’ when loading ‘ggmcmc’
See ‘/private/tmp/Rtmp5nYH93/check_cranb673158d7feb/ggmcmc.Rcheck/00install.out’ for details.
probably because you're doing too many unqualified imports
Right now ggmcmc() produces a pdf file with all the types of plots. It would be possible to make it produce only one type of plots, and still take advantage of the division in several pages.
Suggested by dmenne by email.
In ggs_density, it would be nice to have a thick line along the x-axis indicating the X% HPD interval. Kruschke's BEST package does this as do many others. For consistency, maybe use the thick-and-thin ggs_catarpillar style.
ggmcmc is great, thank you!
Hi,
Thanks for your package, which I'm making very heavy use of! I wonder if you'd be open to separating out the rhat calculation from ggs_rhat into a separate function? Happy to push some code doing that if it makes it easier.
A big chunk of what I use it for is simply the ggs() conversion of mcmc to a data frame, which I then conduct lots of post processing of. As part of that post-processing, I often find myself wanting to calculate the Rhat, and while ggs_rhat() does that for me, it's a bit round-about to try and pull the values from the ggplot object. I've ended up simply cutting the relevant rhat calculation code out of the code body, and using that as a stand-alone function instead.
Improve the facilities to convert MCMCpack, rstan or output from other R packages into a data frame that ggmcmc can process
Hi, I have a feature request. I would like to keep the order of parameters the same as it is from the output of the mcmc.list
. I mostly use runjags
package, so the summary()
output of a runjags
object will typically be in the order I set it in the monitor
argument of run.jags()
.
E.g. The plots from the runjags model mod <- run.jags(..., monitor = c("gamma", "beta" ))
will sort the parameters in alphabetical order, so it will appear as beta[1], beta[2],..., gamma[1], gamma[2],...
etc.
I understand there is the sort=FALSE
option in ggs()
function, but this doesn't seem to be doing what I'd like it to.
One quick and lazy way that I found to work was to edit the lines in ggs.R
where it says D$Parameter <- factor(D$Parameter)
to D$Parameter <- factor(D$Parameter, levels = unique(D$Parameter))
, in particular under the if statement for sort
. This then allows me to run ggs()
function with sort = FALSE
, and it will give me plots exactly in the order as it appears from the runjags
output.
The only thing is I don't know if this breaks the sort = FALSE
functionality that you originally intended. I can't seem to figure out what this does?
Hope I was clear. Thanks for a great package!
Add visual ways to have a way to compare the distribution of the observed data against the distribution of the posterior predicted outcomes.
In addition to improve the facilities to import MCMC samples from other packages, ggs() must be able to select some parameters from the input objects, based on their similar name.
That should point here
ggs_density as an example still required import of coda when used together with rstan. When coda is imported, traceplot(with implicit) rstan falsely dispatches to coda::traceplot, giving an error. rstan::traceplot works in this case, but is not nice.
ggmcmc github version, installed today.
library(ggmcmc)
stanmodelcode <- "
data {
int<lower=0> N;
real y[N];
}
parameters {
real mu;
}
model {
mu ~ normal(0, 10);
y ~ normal(mu, 1);
}
"
y <- rnorm(20)
dat <- list(N = 20, y = y);
fit <- stan(model_code = stanmodelcode, data = dat, chains = 3)
#### Another bug in ggmcmc: this dependency is missing.
# Things should work without the next line
library(coda) # Bit this one gives an warning which should be avoided by using namespaces
s = ggs(fit)
#Error in 1:n.samples : argument of length 0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.