Interpreting Regression Models: Why It Matters Whether You Include a Mediator
In Parkinson’s research, and really in all biomedical science, we often use multivariable regression to estimate the association between an exposure (e.g., physical activity, sleep, social isolation) and an outcome (e.g., cognitive decline, quality of life, motor symptoms).
But how we interpret those models depends critically on whether covariates are confounders or mediators.
Confounder: A variable that influences both the exposure and the outcome but is not part of the causal pathway.
Failing to adjust for a confounder leaves an open “backdoor path” between the exposure and outcome, leading to a spurious association.
Example 1: Matches and lung cancer.
Carrying matches (exposure) is associated with lung cancer (outcome)—but only because of their shared relationship with smoking. Smoking is the true confounder. Once you adjust for smoking, the matches-lung cancer association disappears.
Example 2: Physical activity → cognitive decline.
Age is a classic confounder: older adults tend to be less active and experience more cognitive decline. Age affects both variables but isn’t on the pathway from activity to cognition. If you observe an association between activity and cognition without adjusting for age, you can’t distinguish whether it reflects a real effect or just age’s influence. Once you adjust for age, you’re estimating the independent effect of physical activity.
So what happens when you include a confounder in your model?
- Without the confounder: you’re estimating a biased effect
- With the confounder: you’re estimating an unbiased effect, independent of that confounder
Mediator: A variable that lies on the causal pathway between the exposure and outcome.
Example: Physical activity may improve sleep quality, which in turn benefits cognitive health. Here, sleep is a mediator of the activity–cognition relationship. Physical activity may also affect cognition through other mechanisms (e.g., cardiovascular health, neuroplasticity, epigenetics).
So what happens when you include a mediator?
- Without the mediator: you’re estimating the total effect of the exposure
- With the mediator: you’re estimating the direct effect—the effect of the exposure excluding the path through the mediator
In our example, including both physical activity and sleep in the model will “block” part of the true impact of activity on cognition. The coefficient for activity will now only reflect its influence not through sleep, potentially underestimating its full effect.
Why this matters
If your goal is to understand the overall benefit of an exposure (like exercise), avoid adjusting for mediators.
If your goal is to understand how the effect happens, mediation analysis can help estimate the direct and indirect components.
But wait—how do you know whether something is a confounder or mediator?
Here’s the catch: statistically, they can look the same in a model.
This is why black-box or “kitchen-sink” approaches (throwing in all possible covariates) can backfire. While this might help reduce confounding, it can also remove meaningful mediated effects—and potentially distort your findings.
Instead: think causally.
Build a conceptual model or draw a directed acyclic graph (DAG) to map the relationships among your exposure, outcome, and covariates. Knowing the role of each variable allows you to build better models and tell a clearer scientific story.
Would love to hear how others in Parkinson’s research think about this—especially for topics like motor vs cognitive symptoms, exercise, inflammation, or neuropsychiatric symptoms. Let’s discuss!