Linear multilevel models

Linear multilevel (or mixed) models are useful when there are dependencies (hierarchies) in the data:

Example 1: Measuring the impact of a new treatment on patient outcomes provided by different doctors/clinics. Patient level observations are not independent, as within a given doctor patient outcomes are more similar. But across doctors there is also a treatment effect.

Example 2: Determining what features predict admittance into college. Predictors include a student’s high school GPA, extracurricular activities, and SAT scores. Predictors also include whether the school attended was public or private, current student-to-teacher rations, and the school’s rank against other schools. So we have student-level and school-level predictors, and students are nested within schools.

Example 3: A psychologist measures changes in cognitive performance over time (i.e., learning) among individuals, and is interested in whether there are differences among individuals (or groups of individuals). There are multiple test occassions (i = 1…I) for each individual, and multiple individuals (j = 1…n), in multiple groups (k = 1…K). So I (number of test occassions) can vary between J individuals, and the number of indiviudals in each group n can vary between K groups.

Other methods

Ordinary Least Squares (OLS) linear regression. This may ignore dependencies within the data
Summarise each cluster and perform regression on the averages. This underrepresents the variablility in the data (since the averages have variability around them)
OLS regression with clustered standard errors. These can adjust for non-independence but do not handle many multiple sources of clustering, and missing data is handled by listwise deletion (causing bias and reduced power)
Fixed effect models (with clustered standard errors). These control for the dependency but don’t let you explore the random effects (i.e., conditional means evaluated at the estimated parameters) or the variance
Repeated measures ANOVA (RM ANOVA). These handle the dependency (e.g., over time) but time is treated categorically (so time interval is ignored), and all observations must have the same number of repeats. Differences in change over time (slope) are handled by Greenhouse-Geisser adjustment, which is suboptimal.

Overview

Prof. Matthew Clapman from UC Santa Cruz provides a brief and approachable introduction to linear multilevel models:

Theory

The UCLA Office of Advanced Research Computing provides expert resources on statistical methods and data analytics. Below is a link to their introduction to linear multilevel models:

https://stats.oarc.ucla.edu/other/mult-pkg/introduction-to-linear-mixed-models/

Simulation

To understand the model, we simulated an effect with known (i.e., true) parameters and compared the model results to the true parameters.

The effect of student grades on their subsequent starting salary after graduation was simulated. Students were clustered by degree/major (e.g., architecture, biology, computer science, dentistry, engineering).

We chose simulation parameters such that as the average grade in a major got higher, salary was worse. So easier majors paid worse. However the true effect of grade on salary within each major was 2.

Intercepts for each major $\sim N(mean = -10 \times {grade}, \ sd = 6)$
Slopes for each major $\sim N(mean = 2, sd = 0.1)$
Easier majors get paid worse
True effect of grade on salary is 2

sim <- function(N = 100, g = 5, mu = c(0, 2), sigma = c(1, 1, 1)) {
  
  require(purrr)
  require(dplyr)
  
  n = round(N / g)
  
  N = n * g
  
  purrr::cross_df(
    list(ids = 1:n, major = 1:g)
  ) %>%
    dplyr::mutate(
      grade = rnorm(N),
      major = as.factor(letters[major])
    ) %>%
    dplyr::group_by(major) %>%
    dplyr::mutate(
      average_grade = mean(grade),
      intercept = rnorm(1, mu[1]*average_grade, sigma[1]),
      slope = rnorm(1, mu[2], sigma[2]),
      salary = rnorm(n, intercept + slope*grade, sigma[3])
    ) %>%
    ungroup()
}

set.seed(5) # 5, mu = -1, 2, sigma = 6, .3, 3
sim_data <- sim(N = 25, mu = c(-10, 2), sigma = c(6, 0.1, 3))
contrasts(sim_data$major) <- contr.sum

Description of the data

We have unique IDs for each person (ids), as well as the major, grade and salary for each person. We also have the average grade for each major, and the true intercept and true slope for each major.

head(sim_data, 20)

## # A tibble: 20 × 7
##      ids major   grade average_grade intercept slope  salary
##    <int> <fct>   <dbl>         <dbl>     <dbl> <dbl>   <dbl>
##  1     1 a     -0.841          0.214     -3.90  2.03  -2.75 
##  2     2 a      1.38           0.214     -3.90  2.03  -4.12 
##  3     3 a     -1.26           0.214     -3.90  2.03 -12.5  
##  4     4 a      0.0701         0.214     -3.90  2.03  -9.04 
##  5     5 a      1.71           0.214     -3.90  2.03  -0.851
##  6     1 b     -0.603         -0.372     12.2   2.11  15.6  
##  7     2 b     -0.472         -0.372     12.2   2.11   8.82 
##  8     3 b     -0.635         -0.372     12.2   2.11  10.7  
##  9     4 b     -0.286         -0.372     12.2   2.11  17.3  
## 10     5 b      0.138         -0.372     12.2   2.11  11.1  
## 11     1 c      1.23          -0.377     12.8   2.22  17.2  
## 12     2 c     -0.802         -0.377     12.8   2.22   8.32 
## 13     3 c     -1.08          -0.377     12.8   2.22   8.98 
## 14     4 c     -0.158         -0.377     12.8   2.22  10.2  
## 15     5 c     -1.07          -0.377     12.8   2.22  10.2  
## 16     1 d     -0.139         -0.588      1.94  2.12   6.03 
## 17     2 d     -0.597         -0.588      1.94  2.12   1.23 
## 18     3 d     -2.18          -0.588      1.94  2.12   0.367
## 19     4 d      0.241         -0.588      1.94  2.12   0.671
## 20     5 d     -0.259         -0.588      1.94  2.12   1.05

Complete pooling model: lm()

Ignores major.

$$ \[\begin{align} salary_i &= \beta_0 + \beta_1 grade_i + \epsilon_i \\ \\ \epsilon_i &\sim N(0, \sigma) \end{align}\] $$

For $i = 1...25$

Model has 3 parameters:

$\beta_0$ (intercept)
$\beta_1$ (slope)
$\sigma$ (variance of error)

fit <- lm(formula = salary ~ 1 + grade, 
          data = sim_data)


summary(fit)

## 
## Call:
## lm(formula = salary ~ 1 + grade, data = sim_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.3119  -8.3384   0.2771   5.8256  20.3858 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    1.273      1.997   0.638   0.5300  
## grade         -3.653      2.081  -1.755   0.0925 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.979 on 23 degrees of freedom
## Multiple R-squared:  0.1181, Adjusted R-squared:  0.07978 
## F-statistic: 3.081 on 1 and 23 DF,  p-value: 0.09254

Overall effect of grade on salary is negative

Unpooled model: lm() for each major

Major-specific models

$salary_i = \beta_0 + \beta_1 grade_i + \epsilon_i$

For $i = 1...5$

lm(formula = salary ~ 1 + grade, 
   data = sim_data[sim_data$major=="a", ])  # for "a", "b", "c", "d", "e"

model/major	intercept	slope	sigma	p.value
a	-6.35	2.38	4.18	0.23
b	12.58	-0.35	4.13	0.96
c	12.26	3.41	1.51	0.02
d	2.37	0.85	2.55	0.58
e	-14.59	1.87	2.11	0.64
average	1.25	1.63	2.90	0.49

Treats majors as completely independent
Residual standard errors ($\sigma$) are much lower
On average, grade has a positive effect on salary (but no valid inferential estimate)

Interaction model

\[ salary_i = \beta_0 + \beta_1 grade_i + \beta_{2[j]} major_j + \beta_{3[j]} grade_i \times major_j + \epsilon_i \]

contrasts(sim_data$major) <- contr.sum

lm.full <- lm(salary ~ 1 + grade * major, data = sim_data)

term	estimate	std.error	statistic	p.value
(Intercept)	1.25	1.27	0.99	0.34
grade	1.63	1.52	1.07	0.30
sigma	3.09

slope and intercept match the unpooled model (in a balanced dataset)
inference is possible

Linear model vs Linear multilevel model

Linear model

all effects (intercepts & slopes) are fixed and non-random
All? No not all

\[ \begin{align} \text{Level 1}: y_i &= \beta_0 + \beta_1 x_i + \epsilon_i \\ \\ \\ \text{Level 2}: \epsilon_i &\sim N(0, \sigma) \end{align} \]

Linear multilevel model

observations [i] belong to subgroups [j]
subgroups [j] are explicitly treated as random draws from a larger population

\[ \begin{align} \text{Level 1}: y_{ij} &= \alpha_{j} + \beta_1 x_{i} + \epsilon_i \\ \alpha_{j} &= \beta_0 + \beta_{0j} \\ \\ \text{Level 2}: \beta_{0j} &\sim N(\beta_0, \ \tau) \\ \epsilon_i &\sim N(0, \ \sigma) \end{align} \]

lme4::lmer()

Loading the packages

First we will need to install and load the two packages we will use to run LMM in R.

If you do not have a package installed, run: install.packages(“packagename”)

install.packages("lme4")
install.packages("lmerTest")

We now load these packages into the working environment

library(lme4)
library(lmerTest)

lme4 stands for linear mixed-effects models (from Douglas Bates and Ben Bolker [@add reference]). It is the most common package used for running LMMs in R. The main functions are lmer for running models with continuous outcome variables and glmer for models with other kinds of outcome variables (e.g., poisson or binary or logistic regression).

Because the lme4 package does not produce p-values, it is common for researchers more accustomed to working with p-values to use the lmerTest package. lmerTest doesn’t have any functions that we will use but, when loaded, it overrides the standard output of lme4 in order to provide p-values, approximated by the Satterthwaite approach. Be aware that while these p-values are provided automagically, they are approximations and may be very poor in some scenarios (e.g., unbalanced or crossed designs).

Confidence intervals are provided very easily by the lme4 package using conf.int().

Random intercept models

$\alpha_{j}$ indicates that each cluster $j$ (e.g., major) is given a unique intercept ($\beta_{0j}$), drawn from a normal distribution centered on $\beta_0$, the grand intercept, meaning there might be different mean scores for each cluster [j]
In addition to the residual standard deviation $\sigma$, we are now estimating one more variance component $\tau$, which is the standard deviation of the varying intercepts
$\text{ICC} = \tau^2 / (\tau^2 + \sigma^2)$ will go to zero if there is no systematic variation explained by clustering (i.e., so a linear model will do just as well), and go to 1 if clusters perfectly explain error/residual variance (i.e., there is no variation within each cluster j)

lmm.ri <- lmer(
  formula = salary ~ 1 + grade + (1|major),
  data = sim_data
  )

In lmer the model is specied by the formula argument. As in most R model-fitting functions, this is the first argument
The model formula consists of two expressions separated by the ~ symbol
The expression on the left, typically the name of a variable, is evaluated as the response
The right-hand side consists of one or more terms separated by + symbols
A random-effects term consists of two expressions separated by the vertical bar, |, a symbol read as “given” or “by”
Typically, such terms are enclosed in parentheses
The expression on the right of the | is evaluated as a factor, which we call the grouping factor for that term

summary(lmm.ri)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ 1 + grade + (1 | major)
##    Data: sim_data
## 
## REML criterion at convergence: 137.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2577 -0.5724 -0.1475  0.6408  1.8026 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  major    (Intercept) 143.03   11.960  
##  Residual               8.33    2.886  
## Number of obs: 25, groups:  major, 5
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)   1.4512     5.3796  3.9624   0.270   0.8008  
## grade         2.0877     0.7435 19.2071   2.808   0.0112 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## grade 0.004

The estimated effect of grade on salary is:

$\hat \beta_1$ = 2.088

(much closer to the “true” effect!)

We can quantify the benefit of including the random intercept by the ICC:

$\tau^2$ = 143.031
$\sigma^2$ = 8.33
$ICC$ = 0.945

Random intercept + slope model

\[ \begin{align} \text{Level 1}: y_{ij} &= \alpha_{j} + \beta_{1j} x_{i} + \epsilon_i \\ \alpha_{j} &= \beta_0 + \beta_{0[j]} \\ \beta_{j} &= \beta_1 + \beta_{1[j]} \\ \\ \text{Level 2}: \beta_{0[j]} &\sim N(\beta_0, \ \tau) \\ \beta_{1[j]} &\sim N(\beta_1, \ \gamma) \\ \epsilon_i &\sim N(0, \ \sigma) \end{align} \]

In addition to $\alpha_{j}$, $\beta_{j}$ indicates that each cluster $j$ (e.g., major) is given a unique slope ($\beta_{1[j]}$), drawn from a normal distribution centered on $\beta_1$, the grand slope, meaning there might be different changes for each cluster [j]
In addition to the residual standard deviation $\sigma$ and the intercept standard deviation $\tau$, we are now estimating one more variance component $\gamma$, which is the standard deviation of the varying slopes

lmm.full <- lmer(salary ~ 1 + grade + (1 + grade|major), 
            data = sim_data)

summary(lmm.full)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ 1 + grade + (1 + grade | major)
##    Data: sim_data
## 
## REML criterion at convergence: 137.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3735 -0.5334 -0.1515  0.7178  1.7200 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  major    (Intercept) 140.8177 11.8667      
##           grade         0.1208  0.3475  1.00
##  Residual               8.3145  2.8835      
## Number of obs: 25, groups:  major, 5
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)  
## (Intercept)   1.6042     5.3383  3.9464   0.301   0.7789  
## grade         2.1043     0.7572 12.0758   2.779   0.0166 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##       (Intr)
## grade 0.210 
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

The estimated effect of grade on salary is:

$\hat \beta_1$ = 2.104

(close to the “true” effect!)

We can quantify the benefit of including the random slope by the ICC:

$\gamma^2$ = 0.121
$\sigma^2$ = 8.314
$ICC$ = 0.015

Example of APA format results summary

A linear mixed-effects model was fitted to predict salary based on grade, with random intercepts and slopes for grade across major. The model included 25 observations across 5 majors. The model converged successfully (REML = 137.7), though a boundary (singular) fit was detected, suggesting potential overparameterization or perfect correlation among random effects.

The fixed effect of grade was statistically significant, indicating that higher grades were associated with higher salaries, b = 2.10, SE = 0.76, t(12.08) = 2.78, p = .017. The intercept was not significant, b = 1.60, SE = 5.34, t(3.95) = 0.30, p = .779.

Random effects indicated substantial variability in intercepts across majors (SD = 11.87), and a smaller but perfectly correlated variability in slopes for grade (SD = 0.35). The residual standard deviation was 2.88.

Linear model vs Linear multilevel model

The interaction linear model pools the variation in grade in each major separately (similar to the unpooled models)
After fitting the interaction, the average (and fixed) effect of grade does not explain sufficient variation to exceed signficance
The linear multilevel model shares or partially pools variation between fixed (average) and random (major) effects
Variation is attributed to each parameter according to how well it explains each effect (i.e., it’s own error relative to residual error).
After partial pooling the random intercept of major soaks up sufficient variation to leave the fixed (average) effect of grade significant

Model diagnostics

Just like a linear model (OLS), LMM assume the residuals are normal with constant variance, and colinearity between independent variables will reduce power/precision of estimates.

plot(lmm.full)

Plotting the residual vs fitted results should be normally distributed, show no trends, with equal (homogenous) variance

An additional assumption of LMM is there is not heteroscedasticity among different levels of the random effects. Again, we can check this using plots of the residuals, simply split by levels of the random effect of interest (e.g., major)

plot(lmm.full, 
     resid(., scaled=TRUE) ~ fitted(.) | major, 
     abline=0, pch=16, 
     xlab="Fitted values", 
     ylab="Standardised residuals")

Plotting the residual vs each independent variable should also be normal, homogenous and no trends
This can be hard to determine if the sample size in each level is low (hence rules of thumb about minimum sample sizes per level, e.g., > 5)

plot(lmm.full, as.factor(major) ~ resid(., scaled=TRUE), 
     abline=0, pch=16,
     xlab="Standardised residuals", 
     ylab="Major")

You can also compare the difference in residuals between random effect levels

lattice::dotplot(ranef(lmm.full, condVar=TRUE))

## $major

Most of the variability is accounted for by the random intercept
Little variability in random slope (grade)
Maybe some positive relationship between intercept and slope

Trouble-shooting

Eventually you will receive a warning about model convergence. This can have many causes, which may need to be investigated and tested.

Try fitting a simpler model, remove parameters or terms from the model, or simplify the random-effects structure
Centre and standardize your independent variables (e.g., z-scores). Also helps compare magnitude of effects
Try a different convergence algorithm control = lmerControl(optimizer = "bobyqa")
Try a Bayesian model brms::brm(salary ~ 1 + grade + (1 + grade | major), data = sim_data)

Missing data

A major advantage of multilevel models is how they handle missing data. Other methods will give the same results, but they depend on balance. The design must be completely balanced and the resulting data must also be completely balanced. Balance is fragile. Even if the design is balanced, a single missing or questionable observation destroys the balance. Observational studies as opposed to, say, laboratory experiments cannot be expected to yield balanced data sets.

Multilevel experimental design

Random effects can be nested (e.g., see above) or crossed.

For instance in a clinical research design, we could have repeated measures on individuals in different treatment groups. Each treatment is provided by different therapists, so subjects are nested by therapist (and therapists are nested by treatment).

3levelnested lmer(y ~ time * tx + (time | therapist/subjects))

To fit a nested effect we use therapist/subjects, which specifies nesting.

Alternatively, we could have a design with subject-level randomization to treatment; so each therapist delivers both treatment arms. Therapist now represents a crossed effect:

3levelcrossed

lmer(y ~ time * tx + (time | therapist:subjects) + (time * tx | therapist))

The crossed specification therapist:subjects is equivalent to subjects:therapist (order does not matter). Also note that crossed is simply “not nested”. Experimental situations with crossed random factors, such as “subject” and ’stimulus”, are common.

Important note: Nesting or crossing is a property of the data, or rather the experimental design, not the model. However correct specification is necessary when the nested variable (e.g., subject) is not coded uniquely across the higher factor (e.g., therapists). You can avoid confusion about nested and crossed factors by following one simple rule: ensure that different levels of a factor in the experiment correspond to different labels of the factor in the data.

Dr Kristoffer Magnusson has a helpful page on different multilevel designs and how to estimate them using lme4:

https://rpsychologist.com/r-guide-longitudinal-lme-lmer

Summary

Multilevel models are a method for:

estimating variation among groups
a compromise between complete pooling (no variance between groups), no pooling (fixed effect) with all variance between groups
handling groups selected at random from a larger population
sharing information among groups (shrinkage estimation)
allowing predictions for unobserved instances (e.g., missing data)

Advantages are:

accounting for non-independence because your observations are grouped
improving predictions from linear models by taking advantage of shrinkage to get better estimates

But:

estimates are biased (shrunk towards the fixed effect or zero), which is good for prediction but bad for estimating the true causal effect
get complicated quickly

Random effect formula:

simplest: 1|g, single intercept per level of g
nested: 1|g1/g2
random-slopes: x|g
independent terms: (1|g) + (0 + x|g) or (x||g)

Other Resources

Excellent source of material and tutorials on multilevels models in R by Michael Clarke (PhD Psychology): https://m-clark.github.io/mixed-models-with-R/
Interpreting random slope models by Rebecca Pillinger at Bristol University: https://www.bristol.ac.uk/cmm/learning/videos/random-slopes.html
Comparing linear multilevel models in SPSS and R by Prof. David C. Howell (University of Vermont) https://www.uvm.edu/~statdhtx/StatPages/Mixed-Models-Repeated/Mixed-Models-Overview.html
Comparing linear multilevel models with RM ANOVA https://www.theanalysisfactor.com/six-differences-between-repeated-measures-anova-and-linear-mixed-models/
lme4 gain: https://anythingbutrbitrary.blogspot.com/2012/06/random-regression-coefficients-using.html
Oxford University Summer School on linear multilevel models by Dr Kit Double https://kitdouble.github.io/MLM/#Multilevel_Modelling

Advanced Psychometrics Workshop 2025

Dr Richard W Morris

Jun 12, 2025

Linear multilevel models

Overview

Theory

Simulation

Description of the data

Complete pooling model: lm()

Unpooled model: lm() for each major

Interaction model

Linear model vs Linear multilevel model

Linear model

Linear multilevel model

lme4::lmer()

Loading the packages

Random intercept models

Random intercept + slope model

Example of APA format results summary

Linear model vs Linear multilevel model

Model diagnostics

Trouble-shooting

Missing data

Multilevel experimental design

Summary

Other Resources