Introduction
Have you ever heard of the UK Biobank? Although I haven’t had the opportunity to access it yet - since it requires a paid subscription or access through an institution, which I have not yet requested - I must confess my profound interest in this database, particularly for its extensive range of information. In this post, my aim is to highlight the existence of this remarkable data resource and introduce some key concepts about it, all based on information from the UK Biobank’s own website. As I haven’t analyzed any data from this database myself, I highly encourage those who have to enrich this post with valuable information, if possible!
1 - What are biobanks, and why is the UK Biobank significant?
Biobanks are repositories that store and manage biological information, such as genetic and health data, from extensive populations. The list of existing biobanks is vast and continually growing (wikipedia article on that), but the UK Biobank stands out as one of the largest and most comprehensive, providing an invaluable platform for research in genetics, epidemiology, and other medical fields.
A quick PubMed search yielded, to date, 94 published articles mentioning the UK Biobank, with several from very high-impact journals (such as Brain, Alzheimer’s Research and Therapy, Annals of Neurology, and BMC Neurology). It’s unquestionable that the data in the UK Biobank are of significant importance.
2 - What is the structure and what types of data are present in the UK Biobank?
The UK Biobank is a longitudinal cohort with half a million participants and more than 10 years of follow-up for a significant amount of those individuals. It posesses a complex structure that includes comprehensive medical records and diagnoses data, environmental data and exposure, detailed genetic information (including whole genome sequencing), medical images, aswell and both common and rare biomarkers.
Regarding specifically some data the UK Biobank posessess, there is an online tool called UK Biobank data showcase where researchers can check if their data of interest is present within the biobank. For example, a search for “Parkinson’s disease” revealed a variable called “Source of Parkinson’s disease report”, with the following values:
As per my understanding, although this variable isn’t specifically related to PD status, these numbers point out to me that these reports are more or less the number of PD patients in the dataset.
As another example, a search for air pollution reports found the following data for particulate matter 2.5 (one of the most hazardous to human health):
3 - How can one gain access to the UK Biobank data?
Access to the UK Biobank data is typically granted to researchers and institutions through a formal application process. This involves submitting a research proposal and obtaining ethical approval, ensuring responsible and ethical use of the data. Unlike other common data access applications, the UK Biobank application also includes a background check and is only allowed for researchers who have previously published high-quality studies. All necessary information can be found here
Regarding costs, they vary depending on the type of data you want to access, whether you are pursuing an MSc or a PhD, or if you are from a low to middle-income country. Fortunately, the UK Biobank clearly explains this with a figure on their website:
For those who have utilized the UK Biobank, I am curious about your experience with this data. Was the access process straightforward? What insights did you gain, and what are your thoughts on the data this dataset provides?
Looking forward to your experiences and insights on the UK Biobank!