Multivariable regressions and the impact of mediators and confounders

fbbriggs · July 24, 2025, 9:24pm

Interpreting Regression Models: Why It Matters Whether You Include a Mediator

In Parkinson’s research, and really in all biomedical science, we often use multivariable regression to estimate the association between an exposure (e.g., physical activity, sleep, social isolation) and an outcome (e.g., cognitive decline, quality of life, motor symptoms).

But how we interpret those models depends critically on whether covariates are confounders or mediators.

Confounder: A variable that influences both the exposure and the outcome but is not part of the causal pathway.

Failing to adjust for a confounder leaves an open “backdoor path” between the exposure and outcome, leading to a spurious association.

Example 1: Matches and lung cancer.
Carrying matches (exposure) is associated with lung cancer (outcome)—but only because of their shared relationship with smoking. Smoking is the true confounder. Once you adjust for smoking, the matches-lung cancer association disappears.

Example 2: Physical activity → cognitive decline.
Age is a classic confounder: older adults tend to be less active and experience more cognitive decline. Age affects both variables but isn’t on the pathway from activity to cognition. If you observe an association between activity and cognition without adjusting for age, you can’t distinguish whether it reflects a real effect or just age’s influence. Once you adjust for age, you’re estimating the independent effect of physical activity.

So what happens when you include a confounder in your model?

Without the confounder: you’re estimating a biased effect
With the confounder: you’re estimating an unbiased effect, independent of that confounder

Mediator: A variable that lies on the causal pathway between the exposure and outcome.

Example: Physical activity may improve sleep quality, which in turn benefits cognitive health. Here, sleep is a mediator of the activity–cognition relationship. Physical activity may also affect cognition through other mechanisms (e.g., cardiovascular health, neuroplasticity, epigenetics).

So what happens when you include a mediator?

Without the mediator: you’re estimating the total effect of the exposure
With the mediator: you’re estimating the direct effect—the effect of the exposure excluding the path through the mediator

In our example, including both physical activity and sleep in the model will “block” part of the true impact of activity on cognition. The coefficient for activity will now only reflect its influence not through sleep, potentially underestimating its full effect.

Why this matters

If your goal is to understand the overall benefit of an exposure (like exercise), avoid adjusting for mediators.

If your goal is to understand how the effect happens, mediation analysis can help estimate the direct and indirect components.

But wait—how do you know whether something is a confounder or mediator?

Here’s the catch: statistically, they can look the same in a model.

This is why black-box or “kitchen-sink” approaches (throwing in all possible covariates) can backfire. While this might help reduce confounding, it can also remove meaningful mediated effects—and potentially distort your findings.

Instead: think causally.

Build a conceptual model or draw a directed acyclic graph (DAG) to map the relationships among your exposure, outcome, and covariates. Knowing the role of each variable allows you to build better models and tell a clearer scientific story.

Would love to hear how others in Parkinson’s research think about this—especially for topics like motor vs cognitive symptoms, exercise, inflammation, or neuropsychiatric symptoms. Let’s discuss!

AmgadDroby · July 27, 2025, 4:31am

@fbbriggs : thanks for the clear exploitation. This can be very helpful in PD, AD, and even normal aging studies investigating the relationship between brain changes and motor performance, where cognition/ cognitive reserve plays a key role in mediating the associations between these variables.

vcatterson · August 6, 2025, 6:16pm

Really great article, thanks for so clearly laying out the choices and implications!

When the causal relationship is unclear, and you can’t easily draw a DAG, what other techniques might you use?

For example, I have tried leave-one-out modelling and comparing effect size. Let’s say you don’t know that sleep is a mediator between exercise and cognitive performance: you can build 3 models with exercise; sleep; and exercise and sleep together (along with your other confounders), and look at the relative effect sizes in each model. Since the effect will be split between exercise and sleep in the combined model, you can conclude there is a mediator relationship (although directionality is unknown). However, this has the disadvantage of requiring combinatorial numbers of models if you truly cannot make any inference about causal relationships between your attributes.

I’ve also calculated correlations between pairs of attributes before modelling, and more closely examined those that appear to have some relationship. In this case, you would expect higher correlation between sleep and exercise than between other, non-mediated relationships. This gives you a prompt to consider hypotheses for causal links between those attributes.

What do you think? Are there other ways to derive mediators more reliably?

fbbriggs · August 26, 2025, 9:10pm

Great question - when the directionality of the relationship is unknown. A note is that mediator and a confounder will perform the same way in a model (affecting the effect size of the independent variable).

LOO is a great strategy, but again it doesn’t help with directionality, or doesn’t rule out confounding or collider bias - but a great too for exploration.

If temporal ordering if not clear, instrumental variable might be useful - as they can help isolate causal effects (but the challenge is if a suitable IV can be identified) - this is the premise of Mendelian Randomization (@ecebayram FYI).

LAStly, would be conducting formal mediation analyses, cross-sectionally and over-time, I can’t think of it, but I’m near certain there is a sensitivity test (in MR studies there are directionality tests). This just highlights the importance of stating assumptions and hypotheses.

vcatterson · August 27, 2025, 8:44pm

Great point that mediators and confounders will look the same under these various tests! These are helpful suggestions for further analyses to perform. Thanks for your insights!

Topic		Replies	Views
If you need study design, epidemiology, metabolomics insights Find a Collaborator genetic-data , metabolomics , biomarker , neurodegenerative , collaboration	7	70	April 15, 2024
Hi, I'm Matthew Kmiecik from 23andMe Introductions	6	90	February 21, 2024
Best analyses to combine PD data with other neurodegenerative disease data Analyzing and Reusing Data data-format , how-to , data-analysis	2	38	June 15, 2023
Which are the correct identifiers for patients in the PPMI cohort? Analyzing and Reusing Data meta , ppmi , data-interpretation , documentation	6	139	August 8, 2023
Analyzing Repetitive Head Impact/TBI data in Fox Insight Analyzing and Reusing Data fox-insight , risk-factors , data-analysis	5	56	September 5, 2023

Multivariable regressions and the impact of mediators and confounders

Interpreting Regression Models: Why It Matters Whether You Include a Mediator

Why this matters

But wait—how do you know whether something is a confounder or mediator?

Related topics