Linear multilevel (or mixed) models are useful when there are dependencies (hierarchies) in the data:
Example 1: Measuring the impact of a new treatment on patient outcomes provided by different doctors/clinics. Patient level observations are not independent, as within a given doctor patient outcomes are more similar. But across doctors there is also a treatment effect.
Example 2: Determining what features predict admittance into college. Predictors include a student’s high school GPA, extracurricular activities, and SAT scores. Predictors also include whether the school attended was public or private, current student-to-teacher rations, and the school’s rank against other schools. So we have student-level and school-level predictors, and students are nested within schools.
Example 3: A psychologist measures changes in cognitive performance over time (i.e., learning) among individuals, and is interested in whether there are differences among individuals (or groups of individuals). There are multiple test occassions (i = 1…I) for each individual, and multiple individuals (j = 1…n), in multiple groups (k = 1…K). So I (number of test occassions) can vary between J individuals, and the number of indiviudals in each group n can vary between K groups.
Other methods
Prof. Matthew Clapman from UC Santa Cruz provides a brief and approachable introduction to linear multilevel models:
The UCLA Office of Advanced Research Computing provides expert resources on statistical methods and data analytics. Below is a link to their introduction to linear multilevel models:
To understand the model, we simulated an effect with known (i.e., true) parameters and compared the model results to the true parameters.
The effect of student grades on their subsequent starting salary after graduation was simulated. Students were clustered by degree/major (e.g., architecture, biology, computer science, dentistry, engineering).
We chose simulation parameters such that as the average grade in a major got higher, salary was worse. So easier majors paid worse. However the true effect of grade on salary within each major was 2.
sim <- function(N = 100, g = 5, mu = c(0, 2), sigma = c(1, 1, 1)) {
require(purrr)
require(dplyr)
n = round(N / g)
N = n * g
purrr::cross_df(
list(ids = 1:n, major = 1:g)
) %>%
dplyr::mutate(
grade = rnorm(N),
major = as.factor(letters[major])
) %>%
dplyr::group_by(major) %>%
dplyr::mutate(
average_grade = mean(grade),
intercept = rnorm(1, mu[1]*average_grade, sigma[1]),
slope = rnorm(1, mu[2], sigma[2]),
salary = rnorm(n, intercept + slope*grade, sigma[3])
) %>%
ungroup()
}
set.seed(5) # 5, mu = -1, 2, sigma = 6, .3, 3
sim_data <- sim(N = 25, mu = c(-10, 2), sigma = c(6, 0.1, 3))
contrasts(sim_data$major) <- contr.sum
We have unique IDs for each person (ids
), as well as the
major, grade and salary for each person. We also have the average grade
for each major, and the true intercept and true slope for each
major.
## # A tibble: 20 × 7
## ids major grade average_grade intercept slope salary
## <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 a -0.841 0.214 -3.90 2.03 -2.75
## 2 2 a 1.38 0.214 -3.90 2.03 -4.12
## 3 3 a -1.26 0.214 -3.90 2.03 -12.5
## 4 4 a 0.0701 0.214 -3.90 2.03 -9.04
## 5 5 a 1.71 0.214 -3.90 2.03 -0.851
## 6 1 b -0.603 -0.372 12.2 2.11 15.6
## 7 2 b -0.472 -0.372 12.2 2.11 8.82
## 8 3 b -0.635 -0.372 12.2 2.11 10.7
## 9 4 b -0.286 -0.372 12.2 2.11 17.3
## 10 5 b 0.138 -0.372 12.2 2.11 11.1
## 11 1 c 1.23 -0.377 12.8 2.22 17.2
## 12 2 c -0.802 -0.377 12.8 2.22 8.32
## 13 3 c -1.08 -0.377 12.8 2.22 8.98
## 14 4 c -0.158 -0.377 12.8 2.22 10.2
## 15 5 c -1.07 -0.377 12.8 2.22 10.2
## 16 1 d -0.139 -0.588 1.94 2.12 6.03
## 17 2 d -0.597 -0.588 1.94 2.12 1.23
## 18 3 d -2.18 -0.588 1.94 2.12 0.367
## 19 4 d 0.241 -0.588 1.94 2.12 0.671
## 20 5 d -0.259 -0.588 1.94 2.12 1.05
Ignores major.
$$ \[\begin{align} salary_i &= \beta_0 + \beta_1 grade_i + \epsilon_i \\ \\ \epsilon_i &\sim N(0, \sigma) \end{align}\] $$
For \(i = 1...25\)
Model has 3 parameters:
##
## Call:
## lm(formula = salary ~ 1 + grade, data = sim_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.3119 -8.3384 0.2771 5.8256 20.3858
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.273 1.997 0.638 0.5300
## grade -3.653 2.081 -1.755 0.0925 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.979 on 23 degrees of freedom
## Multiple R-squared: 0.1181, Adjusted R-squared: 0.07978
## F-statistic: 3.081 on 1 and 23 DF, p-value: 0.09254
Major-specific models
\(salary_i = \beta_0 + \beta_1 grade_i + \epsilon_i\)
For \(i = 1...5\)
lm(formula = salary ~ 1 + grade,
data = sim_data[sim_data$major=="a", ]) # for "a", "b", "c", "d", "e"
model/major | intercept | slope | sigma | p.value |
---|---|---|---|---|
a | -6.35 | 2.38 | 4.18 | 0.23 |
b | 12.58 | -0.35 | 4.13 | 0.96 |
c | 12.26 | 3.41 | 1.51 | 0.02 |
d | 2.37 | 0.85 | 2.55 | 0.58 |
e | -14.59 | 1.87 | 2.11 | 0.64 |
average | 1.25 | 1.63 | 2.90 | 0.49 |
\[
salary_i = \beta_0 + \beta_1 grade_i + \beta_{2[j]} major_j +
\beta_{3[j]} grade_i \times major_j + \epsilon_i
\]
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.25 | 1.27 | 0.99 | 0.34 |
grade | 1.63 | 1.52 | 1.07 | 0.30 |
sigma | 3.09 |
\[ \begin{align} \text{Level 1}: y_i &= \beta_0 + \beta_1 x_i + \epsilon_i \\ \\ \\ \text{Level 2}: \epsilon_i &\sim N(0, \sigma) \end{align} \]
\[ \begin{align} \text{Level 1}: y_{ij} &= \alpha_{j} + \beta_1 x_{i} + \epsilon_i \\ \alpha_{j} &= \beta_0 + \beta_{0j} \\ \\ \text{Level 2}: \beta_{0j} &\sim N(\beta_0, \ \tau) \\ \epsilon_i &\sim N(0, \ \sigma) \end{align} \]
First we will need to install and load the two packages we will use to run LMM in R.
If you do not have a package installed, run: install.packages(“packagename”)
We now load these packages into the working environment
lme4 stands for linear mixed-effects models (from
Douglas Bates and Ben Bolker [@add
reference]). It is the most common package used for running LMMs
in R. The main functions are lmer
for running models with
continuous outcome variables and glmer
for models with
other kinds of outcome variables (e.g., poisson or binary or logistic
regression).
Because the lme4 package does not produce p-values, it is common for researchers more accustomed to working with p-values to use the lmerTest package. lmerTest doesn’t have any functions that we will use but, when loaded, it overrides the standard output of lme4 in order to provide p-values, approximated by the Satterthwaite approach. Be aware that while these p-values are provided automagically, they are approximations and may be very poor in some scenarios (e.g., unbalanced or crossed designs).
Confidence intervals are provided very easily by the
lme4 package using conf.int()
.
\[ \begin{align} \text{Level 1}: y_{ij} &= \alpha_{j} + \beta_1 x_{i} + \epsilon_i \\ \alpha_{j} &= \beta_0 + \beta_{0j} \\ \\ \text{Level 2}: \beta_{0j} &\sim N(\beta_0, \ \tau) \\ \epsilon_i &\sim N(0, \ \sigma) \end{align} \]
~
symbol+
symbols|
, a symbol read as “given” or “by”|
is evaluated as a
factor, which we call the grouping factor for that term## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ 1 + grade + (1 | major)
## Data: sim_data
##
## REML criterion at convergence: 137.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2577 -0.5724 -0.1475 0.6408 1.8026
##
## Random effects:
## Groups Name Variance Std.Dev.
## major (Intercept) 143.03 11.960
## Residual 8.33 2.886
## Number of obs: 25, groups: major, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.4512 5.3796 3.9624 0.270 0.8008
## grade 2.0877 0.7435 19.2071 2.808 0.0112 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## grade 0.004
The estimated effect of grade on salary is:
(much closer to the “true” effect!)
We can quantify the benefit of including the random intercept by the ICC:
\[ \begin{align} \text{Level 1}: y_{ij} &= \alpha_{j} + \beta_{1j} x_{i} + \epsilon_i \\ \alpha_{j} &= \beta_0 + \beta_{0[j]} \\ \beta_{j} &= \beta_1 + \beta_{1[j]} \\ \\ \text{Level 2}: \beta_{0[j]} &\sim N(\beta_0, \ \tau) \\ \beta_{1[j]} &\sim N(\beta_1, \ \gamma) \\ \epsilon_i &\sim N(0, \ \sigma) \end{align} \]
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ 1 + grade + (1 + grade | major)
## Data: sim_data
##
## REML criterion at convergence: 137.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3735 -0.5334 -0.1515 0.7178 1.7200
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## major (Intercept) 140.8177 11.8667
## grade 0.1208 0.3475 1.00
## Residual 8.3145 2.8835
## Number of obs: 25, groups: major, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.6042 5.3383 3.9464 0.301 0.7789
## grade 2.1043 0.7572 12.0758 2.779 0.0166 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## grade 0.210
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
The estimated effect of grade on salary is:
(close to the “true” effect!)
We can quantify the benefit of including the random slope by the ICC:
A linear mixed-effects model was fitted to predict salary based on grade, with random intercepts and slopes for grade across major. The model included 25 observations across 5 majors. The model converged successfully (REML = 137.7), though a boundary (singular) fit was detected, suggesting potential overparameterization or perfect correlation among random effects.
The fixed effect of grade was statistically significant, indicating that higher grades were associated with higher salaries, b = 2.10, SE = 0.76, t(12.08) = 2.78, p = .017. The intercept was not significant, b = 1.60, SE = 5.34, t(3.95) = 0.30, p = .779.
Random effects indicated substantial variability in intercepts across majors (SD = 11.87), and a smaller but perfectly correlated variability in slopes for grade (SD = 0.35). The residual standard deviation was 2.88.
Just like a linear model (OLS), LMM assume the residuals are normal with constant variance, and colinearity between independent variables will reduce power/precision of estimates.
An additional assumption of LMM is there is not heteroscedasticity among different levels of the random effects. Again, we can check this using plots of the residuals, simply split by levels of the random effect of interest (e.g., major)
plot(lmm.full,
resid(., scaled=TRUE) ~ fitted(.) | major,
abline=0, pch=16,
xlab="Fitted values",
ylab="Standardised residuals")
plot(lmm.full, as.factor(major) ~ resid(., scaled=TRUE),
abline=0, pch=16,
xlab="Standardised residuals",
ylab="Major")
## $major
Eventually you will receive a warning about model convergence. This can have many causes, which may need to be investigated and tested.
control = lmerControl(optimizer = "bobyqa")
brms::brm(salary ~ 1 + grade + (1 + grade | major), data = sim_data)
A major advantage of multilevel models is how they handle missing data. Other methods will give the same results, but they depend on balance. The design must be completely balanced and the resulting data must also be completely balanced. Balance is fragile. Even if the design is balanced, a single missing or questionable observation destroys the balance. Observational studies as opposed to, say, laboratory experiments cannot be expected to yield balanced data sets.
Random effects can be nested (e.g., see above) or crossed.
For instance in a clinical research design, we could have repeated measures on individuals in different treatment groups. Each treatment is provided by different therapists, so subjects are nested by therapist (and therapists are nested by treatment).
lmer(y ~ time * tx + (time | therapist/subjects))
To fit a nested effect we use therapist/subjects
, which
specifies nesting.
Alternatively, we could have a design with subject-level randomization to treatment; so each therapist delivers both treatment arms. Therapist now represents a crossed effect:
lmer(y ~ time * tx + (time | therapist:subjects) + (time * tx | therapist))
The crossed specification therapist:subjects
is
equivalent to subjects:therapist
(order does not matter).
Also note that crossed is simply “not nested”. Experimental
situations with crossed random factors, such as “subject” and
’stimulus”, are common.
Important note: Nesting or crossing is a property of the data, or rather the experimental design, not the model. However correct specification is necessary when the nested variable (e.g., subject) is not coded uniquely across the higher factor (e.g., therapists). You can avoid confusion about nested and crossed factors by following one simple rule: ensure that different levels of a factor in the experiment correspond to different labels of the factor in the data.
Dr Kristoffer Magnusson has a helpful page on different multilevel designs and how to estimate them using lme4:
Multilevel models are a method for:
Advantages are:
But:
Random effect formula:
1|g
, single intercept per level of g1|g1/g2
x|g
(1|g) + (0 + x|g)
or
(x||g)