Many of the cohorts available in the PD space have multiple types of omic data available, including genetic, transcriptomic, and proteomic data. Using, AMP-PD as an example:
And even within each category, there can be multiple types of -omic data:
- transcriptomic data (whole blood)
- exRNA transcriptomic pilot data (plasma and CSF)
- genomic data (whole blood)
- untargeted proteomics (plasma and CSF)
- targeted proteomics (plasma and CSF)
- WGS/snRNA-seq (postmortem brain)
Has anyone here looked into ways to analyze multi-omic datasets to start to see how these datasets complement each other? I have played around with the mixOmics tool a bit, specificially the DIABLO method, which is described in this paper. The goal of this tool is to focus on variable selection:
mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection
Multivariate methods are well suited to large ‘omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (‘components’), which are defined as combination of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structure between the different data sets that are integrated. We have developed several sparse multivariate models to identify the key variables that are highly correlated, and/or explain the biological outcome of interest (e.g. disease status). The identified variables are then more amenable to statistical inference and to posit novel biological hypotheses to be further validated in the laboratory.
Interested to hear what you all think!
5 Likes
I haven’t used this tool but now I’m going to go try it out; thanks for posting!
4 Likes
I have no specific experience on dealing with this kind of variable and tools and, for that, my response to your post may be a bit misleading, however, my working group is analysing this data from PPMI, PDBP and AMP-PD through unsupervised machine learning techniques. The idea is that we first perform individual analyses in those dataset to reduce dimensionality and prioritize important data (perform an MLM on array data and do a gene ontology analysis on transcriptomics, for example), and then those variables in our models. It seems this model you are sharing with us does something in these lines, however, instead of dealing with them individually and then combining them, it does it all at the same time. Am I correct?
1 Like
Yes - in the past I’ve compared different types of omic data at the pathway analysis level - looking for similar gene ontologies or pathways.
This tool is a bit different in that you use your target genes or proteins of interest from whatever -omic assay, normalize the data, and look at them all together - to help with feature selection across multiple -omic types.
Looking forward to seeing what your working group comes up with!
2 Likes
mixOmics is one of the best tools for multi-omics analysis. Before using Diablo multi-omics, I would recommend doing some ground work with the sparse Partial Least Squares Discriminant Analysis (sPLS-DA) tool in each *-omics domain.
You do not have to go in with a list of genes/proteins/biologics/etc (variables) of interest. In a typical setup, one selects variables in each *-omics domain based on entropy (this is referred to as variable importance, vip). One has to formulate a meaningful contrast though, e.g. neurotypical vs. PD or multiple hypothesized PD sub-types, and with regards to that, mixOmics sets up a predictor and computes the its performance, expressed as the balanced error rate (BER).
The online case studies are very useful and the tool copes well with (sparse) microbiome data.
3 Likes