Hello everyone!
I think I might have found an inconsistency between modelbased's (estimate_means) and brms's conditional_effects when it comes to computing the EMMs with re_formula. In particular, this regards the difference between conditional and marginal effects.
Brief intro:
I found this post by Andrew Heiss very clear when it comes to describing the marginal (re_formula = NULL, default in modelbased) vs conditional (re_formula = NA) effects.
Here's a general definition:
- Conditional effect =
re_formula = NA > random offsets set to 0
- Marginal effect =
re_formula = NULL: random offsets are incorporated into the estimate
This definition seems to be shared elsewhere. In several blogs, they say that re_formula = NULL should increase the uncertainty in the predictions due to the inclusion of the random effects, whereas with re_formula = NA, the predictions are based only on fixed effects and the estimated error of those parameters. Here, Paul Buerkner is saying this. In the brms manual, it says the same thing:
re_formula: A formula containing group-level effects to be considered in the conditional predictions. If NULL, include all group-level effects; if NA (default), include no group-level effects
However, in modelbased's EMMs, the Credible Intervals with NA are way larger than those with NULL. But shouldn't it be the other way around? If NULL includes all the uncertainty coming from the random part of the model, shouldn't the CIs be larger? I confess I'm a bit confused.
Another user reported a similar doubt, but the replies are still a bit confusing to me. I noticed everyone suggests using NA as default (brms too), but in modelbased NULL seems to be the default, so I wondered why.
I'm attaching here three figures derived by three models (reproducible code below):
- scale(Reaction) ~ Days + (1+Days|Subject), data=sleepstudy
- scale(height) ~ Occasion + (1+Occasion|Subject), data = Oxboys
What I noticed is that the estimates seem more similar with NA (still not identical, though), but they vary enormously with NULL. Compared to NA, modelbased's NULL decreases the variability, whereas brms' conditional_effects increases it. Brms's results seem more logical to me though; cause, again, it makes sense that if you add the variability coming from the RE, the uncertainty should be larger. However, to complicate things even more, the NULL estimates with brms give the impression that nothing is "significant", whereas in fact everything is!
- Finally, this one comes from a more complex model with one random intercept and three random slopes (1 + x + y + z | participant). Here the discrepancies seem even larger.
Here's a reproducible code for the first two figures. I hope it's helpful for you to get to the bottom of this XD
Also, I apologise if I made some mistakes in describing what I found. Maybe it's just me not getting it.
### Marginal / Conditional Effects in B-GLMMs (brms & modelbased) ###
library(pacman)
pacman::p_load(brms, modelbased, lme4)
#### Sleepstudy dataset (slope) ####
data("sleepstudy", package = "lme4")
m <- brm(scale(Reaction) ~ Days + (1+Days|Subject),
family = gaussian(),
data = sleepstudy,
prior = c(
set_prior("normal(0,1)", class = "Intercept"),
set_prior("normal(0,1)", class = "b"),
set_prior("exponential(1)", class = "sd"),
set_prior("exponential(1)", class = "sigma")
),
cores= 4,
control = list(adapt_delta = 0.95),
save_pars = save_pars(all = TRUE),
backend = "cmdstanr",
seed = 1234)
## Marginal effect ##
emm_marginal <- estimate_means(m, by="Days", centrality="mean", ci=.95, ci_method="hdi", re_formula=NULL)
## Conditional effect ##
emm_conditional <- estimate_means(m, by="Days", centrality="mean", ci=.95, ci_method="hdi", re_formula=NA)
ce1 <- conditional_effects(m, effects = "Days", re_formula = NULL)
ce2 <- conditional_effects(m, effects = "Days", re_formula = NA)
p1 <- plot(ce1, plot = FALSE)[[1]]
p2 <- plot(ce2, plot = FALSE)[[1]]
# Merge all graphs
plot(emm_marginal)+labs(title="NULL | modelbased", y="Reaction")+
plot(emm_conditional)+labs(title="NA | modelbased", y="Reaction")+
p1 + labs(title="NULL | brms conditional_effects", y="Reaction")+
p2 + labs(title="NA | brms conditional_effects", y="Reaction")&
coord_cartesian(ylim=c(-2.5,3))&
theme_bw()
#### Oxboys dataset (mean difference) ####
data("Oxboys", package = "nlme")
Oxboys <- subset(Oxboys, Occasion <= 4)
Oxboys$Occasion <- factor(Oxboys$Occasion, ordered = FALSE)
m3 <- brm(scale(height) ~ Occasion + (1+Occasion|Subject),
family = gaussian(),
data = Oxboys,
prior = NULL,
cores= 4,
control = list(adapt_delta = 0.95),
save_pars = save_pars(all = TRUE),
backend = "cmdstanr",
seed = 1234)
## Marginal effect ##
emm_marginal3 <- estimate_means(m3, by="Occasion", centrality="mean", ci=.95, ci_method="hdi", re_formula=NULL)
## Conditional effect ##
emm_conditional3 <- estimate_means(m3, by="Occasion", centrality="mean", ci=.95, ci_method="hdi", re_formula=NA)
ce13 <- conditional_effects(m3, effects = "Occasion", re_formula = NULL)
ce23 <- conditional_effects(m3, effects = "Occasion", re_formula = NA)
p13 <- plot(ce13, plot = FALSE)[[1]]
p23 <- plot(ce23, plot = FALSE)[[1]]
# Merge all graphs
plot(emm_marginal3)+labs(title="NULL | modelbased", y="Height (z-score)")+
plot(emm_conditional3)+labs(title="NA | modelbased", y="Height (z-score)")+
p13 + labs(title="NULL | brms conditional_effects", y="Height (z-score)")+
p23 + labs(title="NA | brms conditional_effects", y="Height (z-score)")&
coord_cartesian(ylim=c(-3,3.5))&
theme_bw()
Hello everyone!
I think I might have found an inconsistency between modelbased's (
estimate_means) and brms'sconditional_effectswhen it comes to computing the EMMs withre_formula. In particular, this regards the difference between conditional and marginal effects.Brief intro:
I found this post by Andrew Heiss very clear when it comes to describing the marginal (
re_formula = NULL, default inmodelbased) vs conditional (re_formula = NA) effects.Here's a general definition:
re_formula = NA> random offsets set to 0re_formula = NULL: random offsets are incorporated into the estimateThis definition seems to be shared elsewhere. In several blogs, they say that
re_formula = NULLshould increase the uncertainty in the predictions due to the inclusion of the random effects, whereas withre_formula = NA, the predictions are based only on fixed effects and the estimated error of those parameters. Here, Paul Buerkner is saying this. In thebrmsmanual, it says the same thing:However, in modelbased's EMMs, the Credible Intervals with NA are way larger than those with
NULL. But shouldn't it be the other way around? IfNULLincludes all the uncertainty coming from the random part of the model, shouldn't the CIs be larger? I confess I'm a bit confused.Another user reported a similar doubt, but the replies are still a bit confusing to me. I noticed everyone suggests using
NAas default (brmstoo), but in modelbasedNULLseems to be the default, so I wondered why.I'm attaching here three figures derived by three models (reproducible code below):
What I noticed is that the estimates seem more similar with
NA(still not identical, though), but they vary enormously withNULL. Compared toNA, modelbased'sNULLdecreases the variability, whereas brms' conditional_effects increases it. Brms's results seem more logical to me though; cause, again, it makes sense that if you add the variability coming from the RE, the uncertainty should be larger. However, to complicate things even more, theNULLestimates with brms give the impression that nothing is "significant", whereas in fact everything is!Here's a reproducible code for the first two figures. I hope it's helpful for you to get to the bottom of this XD
Also, I apologise if I made some mistakes in describing what I found. Maybe it's just me not getting it.