ML in Imaging: Overfitting, Underfitting, and Best Practices

While reviewing several imaging studies applying machine learning over the past couple of weeks, a pattern keeps appearing where suggested/ tested models are often trained without independent validation cohorts, datasets are small, and performance claims can be inflated by overfitting or, conversely, underfitting that obscures meaningful biological signals. These challenges aren’t just technical; they shape the conclusions we draw and how reproducible our findings are.

This got me wondering, what principles should guide us when applying ML to imaging data? Below is a set of statements reflecting common issues and approaches — where do you stand?

Which of these statements best reflects your perspective?

  • Model performance is driven more by label quality than model architecture.
  • Cross-validation alone is not enough to assess generalizability in imaging studies.
  • Exploratory data analysis often reveals more than model training does.
  • High variance (overfitting) is more common than reported in small imaging datasets.
  • Underfitting is just as damaging in imaging ML as overfitting it hides real biology.

Feel free to share examples from your own work or papers you’ve read where overfitting, underfitting, or limited validation impacted the conclusions, and what lessons did you take away? Also if there are any ML experts among us, we would be happy if you can share some basic know-hows and best practices tha can help us navigate these issues.

1 Like

Thanks for this post, @AmgadDroby! It’s critical we understand the limitations of these models so as to avoid, as you say, the obfuscation of meaningful biological signals.

Wondering if other community members with interest / experience in ML have any tips for @AmgadDroby on best practices to avoid these model-fitting issues?

@nisha @Stapapou @rashmi @prabeshk @Roberta_Repossi @VidyadharaDJ @braah @ece.kocagoncu @jbmchls @namburin @mchaparro @roussos

Hi @gginnan,

I haven’t conducted any ML projects using imaging data; however, I have been developing models based on clinical and genetic information. I should mention that I’m not an ML expert, I’m a medical geneticist currently learning and applying ML methods during my PhD.

That said, one of the biggest challenges I’ve encountered is obtaining an external validation cohort that truly resembles the dataset used to train the model. This requires having exactly the same variables available, properly harmonized and scaled when needed, among other considerations.

In practice, external validation datasets are often similar but not identical to the training data. This, in turn, highlights an important limitation for real-world applicability of even the best-performing models, since the ultimate goal of these projects is successful deployment.

At the same time, as mentioned by @AmgadDroby, cross-validation remains a key strategy when external validation is not feasible. Hopefully, ongoing data-collection consortia such as GP2 and PPMI will allow us to access larger sample sizes and more harmonized datasets, helping to overcome some of these challenges in the future and, ideally, enabling external validation of many of the ML models already published.

2 Likes