# load and install packages
library(pacman)
p_load(readxl, tidyverse, metafor, ape, dplyr,
here, clubSandwich, corpcor, xtable, kableExtra)
Case studies for: Modelling approaches for meta-analyses with dependent effect sizes in ecology and evolution
This webpage is supporting information for the case studies in the following manuscript:
Williams, C., Yang, Y., Warton, D. I., & Nakagawa, S. (2025). Modelling approaches for meta-analyses with dependent effect sizes in ecology and evolution: A simulation study. Methods in Ecology and Evolution. DOI: 10.1111/2041-210x.70156
Overview
Here we provide code to reproduce the results from the case studies in our manuscript “Modelling approaches for meta-analyses with dependent effect sizes in ecology and evolution: a simulation study”. This code can be used to produce similar analyses to find an appropriate sampling variance covariance matrix assuming a constant within-study correlation.
We note three important and cautionary points.
For the purpose of illustrating the application of modelling approaches to handle dependent sampling errors, we reanalysed two published meta-analyses. The models have been simplified compared with those in the original studies, so the results are for illustration only and should not be used to draw substantive conclusions.
If the primary studies provided sufficient information to compute covariances and correlations among effect sizes, then the sampling \(\mathbf{V^*}\) structure can be more complex (i.e. non-constant correlation within-studies) and specified using a combination of known and arbitrary chosen values depending how much information is available.
The usual model specification and model selection should be undertaken before comparing model fits with different sampling \(\mathbf{V^*}\) matrix specification; adding a random effect at the level of the individual estimate (very important and is often overlooked), accounting for potential heterogeneity at each level of hierarchical or correlated clustering (this can be difficult as it can be unclear what consists of a `level’ in certain datasets), and finally consider if a level should be modelled as a random effect or fixed effect (read more in section 6 in Gelman (2025)).
For any comments or questions please contact: Coralie Williams (coraliewilliams@outlook.com)
Before starting let’s load and install the necessary R packages.
1. Case study: Meta-analysis on plant coexistence
Crawford et al. (2019) used a large meta-analysis dataset of pairwise plant-soil feedback measures to investigate whether these feedbacks contribute to plant species coexistence. First we want to load the meta-analytical dataset provided in Crawford et al. (2019).
# link to download data file: "https://figshare.com/ndownloader/files/14874749"
<- read_excel(here("data/case_studies/Supplementary Table 1.xlsx"), sheet="Data")
dat <- dat |> filter(`Mycorrhizal status`=="different") dat
Now, we want to set up a vector of within-study correlations \(\rho\) to test (including \(\rho=0\)). Then we specify the sampling error variances (the diagonal of the assumed sampling error \(\mathbf{V^*}\)) and an empty dataframe to store the results from all models.
# corr values to test
<- seq(0, 0.9, 0.1)
cor_values
# set up sampling error variances and effect size id
<- dat$`Variance(rrIs)`
vi <- dat$`Pairwise comparison`
es.id
# empty dataframe to store results
<- data.frame(rho = numeric(), beta = numeric(),
results se = numeric(), se.cr1 = numeric(),
pval = numeric(), pval.cr1 = numeric(),
s2.study = numeric(), s2.es = numeric(),
s2.phylogeny = numeric(), s2.species = numeric(),
loglik = numeric())
Using a for loop we generate a model per each within study constant \(\rho\) value. First, we derive the assumed sampling error \(\mathbf{V^*}\). Second, we fit a multilevel model with a t-distribution and using the adjusted degrees of freedom as recommended by (see more recommended practices here). Third, we obtain the cluster robust CR2 model estimates. Finally, we store the results within the specified dataframe.
for (rho in cor_values) {
# get VCV matrix
<- vcalc(vi,
V cluster=Study,
obs=es.id,
data=dat,
rho=rho)
# fit the multilevel meta-analysis model with this VCV
<- rma.mv(yi = rrIs,
model V = V,
random = list(~1 | Study,
~1 | es.id),
data = dat,
test = "t",
dfs = "contain",
method = "REML")
# get CR2 estimates
<- coef_test(model, vcov="CR2", cluster = dat$Study) ## default test is Satterthwaite
mod.cr2
# store model output
<- rbind(results, data.frame(rho = rho,
results beta = round(model$beta[1], 2),
se = round(model$se[1], 3),
se.cr2 = round(mod.cr2$SE, 3),
pval = round(model$pval[1], 4),
pval.cr2 = round(mod.cr2$p_Satt, 4),
s2.es = round(model$sigma2[2], 3),
s2.study = round(model$sigma2[1], 3),
loglik = round(logLik(model), 3)))
}
kable(results)
rho | beta | se | se.cr2 | pval | pval.cr2 | s2.es | s2.study | loglik |
---|---|---|---|---|---|---|---|---|
0.0 | -0.04 | 0.155 | 0.154 | 0.7857 | 0.7852 | 0.190 | 0.229 | -56.290 |
0.1 | -0.03 | 0.152 | 0.152 | 0.8413 | 0.8409 | 0.193 | 0.211 | -55.794 |
0.2 | -0.02 | 0.151 | 0.150 | 0.8895 | 0.8891 | 0.198 | 0.198 | -55.384 |
0.3 | -0.01 | 0.150 | 0.149 | 0.9299 | 0.9296 | 0.204 | 0.187 | -55.043 |
0.4 | -0.01 | 0.150 | 0.149 | 0.9628 | 0.9626 | 0.211 | 0.178 | -54.757 |
0.5 | 0.00 | 0.150 | 0.149 | 0.9887 | 0.9886 | 0.219 | 0.170 | -54.517 |
0.6 | 0.00 | 0.150 | 0.148 | 0.9918 | 0.9917 | 0.229 | 0.163 | -54.316 |
0.7 | 0.00 | 0.150 | 0.148 | 0.9783 | 0.9781 | 0.239 | 0.156 | -54.151 |
0.8 | 0.01 | 0.150 | 0.149 | 0.9705 | 0.9703 | 0.251 | 0.150 | -54.020 |
0.9 | 0.01 | 0.150 | 0.149 | 0.9682 | 0.9679 | 0.264 | 0.143 | -53.923 |
2. Case study: Phylogenetic multilevel meta-analysis on behavioural predictability
Horváth et al. (2023) investigated whether behavioural type (mean behaviour) and behavioural predictability (within-individual variation) evolve independently or under system-specific constraints across multiple species. Here we load and prepare the meta-analytical dataset and phylogenetic tree provided in Horváth et al. (2023).
### code from supplementary HTML file of Horváth et al. (2023)
library(ape); library(Matrix);library(rotl); library(rncl);
library(Hmisc); library(lattice); library(HDInterval)
# Set up meta-analysis data
<- read_excel(here("data/case_studies/Metadata_ALL_NEWER.xlsx"))
xdata $N_OBS<-as.numeric(xdata$N_OBS)
xdata$N_ID<-as.numeric(xdata$N_ID)
xdata$N_rep_mean<-as.numeric(xdata$N_rep_mean)
xdata$ES.ID<-as.factor(xdata$ES.ID)
xdata$scientific.name <- tolower(as.factor(xdata$scientific.name))
xdata$species<-as.factor(xdata$species)
xdata$taxa <- as.factor(xdata$taxa)
xdata$study <- as.factor(xdata$study)
xdata$behaviour <-as.factor(xdata$behaviour)
xdata$environment <-as.factor(xdata$environment)
xdata$measures.use<-as.factor(xdata$measures.use)
xdata$age <- as.factor(xdata$age)
xdata$sex <- as.factor(xdata$sex)
xdata$temporal.context <- as.factor(xdata$temporal.context)
xdata
## transform correlation coefficients to Fisher’s normalized correlation coefficients (Zr)
<-function(r){
Zr.corr<-0.5*log((1+r)/(1-r))
Zr
}# get Zr
$yi.z <- Zr.corr(xdata$Corr)
xdata# multiply these Zr values by -1 to make positive correlations equate risk-prone animals being more variable
$yi.z[xdata$measures.use == 'latency']<-xdata$yi.z[xdata$measures.use == 'latency']*-1
xdata# get sampling variances
<-xdata$N_rep_mean/(2*(xdata$N_ID-2)*(xdata$N_rep_mean-1))
vi<-data.frame(xdata,vi)
xdata
##### Phylogenetic tree
<- unique(xdata$scientific.name)
species <- as.data.frame(species)
species 1] <- as.character(species[,1])
species[,
<- tnrs_match_names(unique(xdata$scientific.name), context="Animals")
taxa <- structure(taxa$search_string, names=taxa$unique_name)
taxon_map <- tol_induced_subtree(ott_id(taxa)[is_in_tree(ott_id(taxa))])
tr <- readRDS(here("data/case_studies/phylotree.rds"))
tr
# clean up tip labels
<- strip_ott_ids(tr$tip.label, remove_underscores=TRUE)
otl_tips $tip.label <- taxon_map[ otl_tips ]
trany(duplicated(tr$node.label))
$node.label <- NULL
tr<- xdata[xdata$scientific.name %in% tr$tip.label, ]
xdata
# create VCV phylogenetic matrix
$animal <- xdata$scientific.name
xdata<- vcv(compute.brlen(tr, corr = T)) phylo_matrix
Now, let’s set up the vector of within-study correlations \(\rho\) that we want to test (including \(\rho=0\)) and an empty dataframe to store the results from all models.
# corr values to test
<- seq(0, 0.9, 0.1)
cor_values
# empty dataframe to store results
<- data.frame(rho = numeric(), beta = numeric(),
results se = numeric(), se.cr1 = numeric(),
pval = numeric(), pval.cr1 = numeric(),
s2.study = numeric(), s2.es = numeric(),
s2.phylogeny = numeric(), s2.species = numeric(),
loglik = numeric())
For each \(\rho\) value we derive the assumed sampling error \(\mathbf{V^*}\) and fit the phylogenetic meta-analytical model with a t-distribution and using the adjusted degrees of freedom as recommended by (see more recommended for metafor practices here). Additionally, we derive the cluster robust CR1 model estimates.
for (rho in cor_values) {
# get VCV matrix
<- vcalc(vi,
V cluster=study,
obs=ES.ID,
data=xdata,
rho=rho)
# fit the multilevel meta-analysis model with this VCV
<- rma.mv(yi = yi.z,
model V = V,
random = list(~1 | study,
~1 | ES.ID,
~1 | animal, #phylogenetic component
~1 | species), #non phylogenetic component
data = xdata,
R = list(animal = phylo_matrix),
test = "t",
dfs = "contain",
method = "REML")
# get CR1 estimates
<- robust(model, cluster = study, adjust=TRUE)
mod.cr1
# store model output
<- rbind(results, data.frame(rho = rho,
results beta = round(model$beta[1], 2),
se = round(model$se[1], 3),
se.cr1 = round(mod.cr1$se, 3),
pval = round(model$pval[1], 4),
pval.cr1 = round(mod.cr1$pval, 4),
s2.es = round(model$sigma2[2], 3),
s2.study = round(model$sigma2[1], 3),
s2.phylogeny = round(model$sigma2[3], 3),
s2.species = round(model$sigma2[4], 3),
loglik = round(logLik(model), 3)))
}
kable(results)
rho | beta | se | se.cr1 | pval | pval.cr1 | s2.es | s2.study | s2.phylogeny | s2.species | loglik |
---|---|---|---|---|---|---|---|---|---|---|
0.0 | -0.05 | 0.207 | 0.083 | 0.7953 | 0.5199 | 0.381 | 0.133 | 0.115 | 0 | -102.696 |
0.1 | -0.05 | 0.207 | 0.083 | 0.7946 | 0.5183 | 0.382 | 0.132 | 0.115 | 0 | -102.703 |
0.2 | -0.05 | 0.207 | 0.083 | 0.7940 | 0.5168 | 0.384 | 0.130 | 0.115 | 0 | -102.710 |
0.3 | -0.05 | 0.207 | 0.083 | 0.7933 | 0.5153 | 0.385 | 0.129 | 0.116 | 0 | -102.718 |
0.4 | -0.05 | 0.207 | 0.083 | 0.7927 | 0.5138 | 0.386 | 0.128 | 0.116 | 0 | -102.725 |
0.5 | -0.06 | 0.208 | 0.084 | 0.7920 | 0.5122 | 0.388 | 0.127 | 0.116 | 0 | -102.733 |
0.6 | -0.06 | 0.208 | 0.084 | 0.7914 | 0.5107 | 0.389 | 0.126 | 0.117 | 0 | -102.741 |
0.7 | -0.06 | 0.208 | 0.084 | 0.7908 | 0.5093 | 0.390 | 0.125 | 0.117 | 0 | -102.749 |
0.8 | -0.06 | 0.208 | 0.084 | 0.7901 | 0.5078 | 0.391 | 0.124 | 0.117 | 0 | -102.757 |
0.9 | -0.06 | 0.208 | 0.084 | 0.7895 | 0.5063 | 0.393 | 0.123 | 0.117 | 0 | -102.765 |
References
Crawford, K. M., Bauer, J. T., Comita, L. S., Eppinga, M. B., Johnson, D. J., Mangan, S. A., Queenborough, S. A., Strand, A. E., Suding, K. N., Umbanhowar, J., & Bever, J. D. (2019). When and where plant-soil feedback may promote plant coexistence: a meta-analysis. Ecology letters, 22(8), 1274–1284. https://doi.org/10.1111/ele.13278
Gelman, A. (2005). Analysis of variance: Why it is more important than ever. Annals of Statistics, 33(1), 1–53. https://doi.org/10.1214/009053604000001048
Horváth, G., Garamszegi, L. Z., & Herczeg, G. (2023). Phylogenetic meta-analysis reveals system-specific behavioural type–behavioural predictability correlations. Royal Society Open Science, 10(9). https://doi.org/10.1098/rsos.230303
Session information
library(sessioninfo)
library(details)
<- session_info()
si $packages <- si$packages %>%
sifilter(package %in% c("metafor", "ape", "clubSandwich", "Matrix", "corpcor", "dplyr", "kableExtra", "xtable", "rotl", "Hmisc", "lattice"))
details(si, summary = 'Current session info', open = FALSE)
Current session info
─ Session info ───────────────────────────────────────────────────────────────
setting value4.4.2 (2024-10-31 ucrt)
version R version 11 x64 (build 26100)
os Windows
system x86_64, mingw32
ui RTermlanguage (EN)
collate English_Australia.utf8
ctype English_Australia.utf8/Denver
tz America2025-09-04
date 3.4 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
pandoc
─ Packages ───────────────────────────────────────────────────────────────────* version date (UTC) lib source
package * 5.8-1 2024-12-16 [1] CRAN (R 4.4.2)
ape * 0.5.11 2024-06-20 [1] CRAN (R 4.4.1)
clubSandwich * 1.6.10 2021-09-16 [1] CRAN (R 4.4.0)
corpcor * 1.1.4 2023-11-17 [1] CRAN (R 4.4.1)
dplyr * 5.2-2 2025-01-10 [1] CRAN (R 4.4.2)
Hmisc * 1.4.0 2024-01-24 [1] CRAN (R 4.4.3)
kableExtra * 0.22-6 2024-03-20 [2] CRAN (R 4.4.2)
lattice * 1.7-1 2024-10-18 [1] CRAN (R 4.4.2)
Matrix * 4.8-0 2025-01-28 [1] CRAN (R 4.4.3)
metafor * 3.1.0 2023-06-15 [1] CRAN (R 4.4.2)
rotl * 1.8-4 2019-04-21 [1] CRAN (R 4.4.2)
xtable
1] C:/Users/z5394590/AppData/Local/R/win-library/4.4
[2] C:/Program Files/R/R-4.4.2/library
[
──────────────────────────────────────────────────────────────────────────────