Asking for help understanding SomeLogic proteomics data

Hello everyone!

I’m using the CSF SomaLogic proteomic dataset under project 151. I was doing some sanity checks and discovered that the values of MAPT and SNCA (both) do not correlate strongly with the corresponding values at baseline for (p)tau, and alpha-synuclein (async), available in the curated dataset from projects 124, 125, and 159.

I was wondering whether anyone has any possible hypothesis on why this is the case? I’ve noticed that the aptamers used by SOMAScan are not available in LONI, so I’m not sure what specific protein form their aptamers are reading. For example for MAPT it would be useful to understand whether their aptamer is specific for pTau, 4R-Tau, … Wondering whether anyone knows how to get this information?

I’ll appreciate any help here, thanks! :slight_smile:

2 Likes

Hi @tiago.azevedo !! I have never done that, but I found this post from @mcbrumm that maybe could be of some help Newly Released PPMI Proteomics Curated Datasets

Hope it gives you some light!

Paula

1 Like

Hi Paula, thanks for sending that link which indeed I’ve checked before too. Unfortunately as far as I can tell it “just” has the information I already knew about this dataset, and not the specific information about the aptamers.

Tiago.

1 Like

I also have this question! Hoping that someone with direct knowledge of the PPMI Project 151 will see the thread here (does anyone know who to tag??)

I tried looking up the SomaLogic website for technical details, but didn’t get very far. Project 151 is a 4k panel, but SomaLogic no longer offers this panel (only 7k or 11k). I found a white paper call “Characterization of the Binding Specificity of SOMAmer® Reagents used in the SomaScan® Assay” (available here: https://somalogic.com/tech-note-download-characterization-of-the-binding-specificity-of-somamer-reagents-used-in-the-somascan-assay/). This is the right sort of information, as it explains why one SOMAmer (proteomic probe) may map to multiple proteins, and states that SomaLogic investigated all proteins with significant homology to the target (defined as >40% amino acid sequence identity with the target protein).

This is my personal inference, but I wonder if the mapping table included with Project 151 of proteomic probe to gene(s) is derived from how many of these significant homology proteins were found to also bind to the reagent. This isn’t super clear from the documentation available, and I don’t know if the table is produced by SomaLogic or by PPMI curators.

Anyway, even if this is the case, I still have the question of what is the specific sequence that a given SOMAmer reagent binds to? There are a couple of probes in particular that I would like to look up the sequence for, if possible.

Finally, @tiago.azevedo , to your specific question: I have also noticed a discrepancy between transcriptional expression of SNCA and levels of alpha synuclein in the clinical data of PPMI (as measured using the ELISA assay). Of course there can be many reasons why transcript abundance doesn’t correlate with CSF abundance, but the manufacturer of the ELISA kit used in PPMI confirmed for us that they do NOT do any epitope mapping of their assay, and couldn’t say how much misfolded alpha synuclein was being measured by their assay. We saw a clear trend where median levels for the HC group were higher than for Prodromal, which were higher than for the PD cohort, and therefore concluded that the ELISA assay was only measuring correctly-folded a-syn.

Without knowing the precise sequence that the SOMAmer reagents bind we can only guess, but it’s possible that one or more of the proteomic projects is binding more or less misfolded a-syn, and thus decorrelating the results across projects.

Hope this helps, although I am also hoping for a better answer from someone else!

1 Like

@gginnan @jgottesman do you happen to know the right person to tag maybe?

1 Like

That’s a great question, @tiago.azevedo! Wonder if @jodiefm or @dalonso would have any insight to offer?

1 Like

@vcatterson I can confirm that all the Project 151 files are as provided by the PI’s lab, PPMI almost never further curates/amends individual project data returns. The Curated Data Cut, Working Group, and derived project data returns are the only places you should expect further curation beyond what was submitted by the project team. @tiago.azevedo If the information you’re looking for was not in the Methods document, and no one here has an answer, I would suggest that you could always reach out to the author of the document with questions - every Methods document should have a direct contact listed at the bottom. This certainly feels like a pretty technical question re: a legacy SomaLogic panel!

4 Likes

Thanks everybody for all the tags and ideas. For completeness, just to leave here that I contacted project 151’s PI by email and indeed I got confirmation that they also don’t know the answer to this question and only SomaLogic could provide the answer. Unfortunately, SomaLogic hasn’t answered my direct contact/email so far :slight_smile:

2 Likes

Thanks for this further information about curation, @lkirsch! Very helpful to know more.

In light of @tiago.azevedo 's response from the 151 PI, I think we all need to pester SomaLogic until they will provide us the answer :wink:

1 Like

Hi. We are also analysing this dataset. I’m not sure about the 4K but in the lager panels (11K) there are multiple aptamers to a-syn that do not correlate with each other or show the same directions of change in our cohort. I’m not sure that is unusual. A lot of different ELISA platforms using antibodies also don’t correlate well with each other. The world of cytokines is a great example of this. I just hope that when people report the somalogic data they can report which aptamers are giving the results rather than just which protein (although maybe for the 4K there is only 1 aptamer per protein of interest). Then it is a matter of trying to determine if the aptamer is specific or what kind of a-syn conformation it is detecting. We are also doing some comparisons between NULISA, Somascan and SIMOA platforms.

3 Likes

Thanks for your perspective on this, @nicdzam! You’re right: there are multiple aptamers tagging certain proteins of interest, even in the 4K, and the count numbers do not exactly correlate with each other. As I understand it, this is to be expected because the aptamers are tagging different sections of the protein. If there is a mis-folding or mis-translation of a section of protein, it may bind to one aptamer and not the other.

However, the information we seem to be missing is: what sequence does each aptamer bind? Without this information, we can’t reason more deeply about why we see discrepant counts across multiple aptamers which ought to tag the same protein. We have the SomaLogic id numbers, and a mapping table to genes, but for a case like a-syn where there are multiple aptamers in the dataset, we can’t hypothesize about why we see counts differing.

Have you been able to work around this problem in a creative way, either for this data or for your other investigations?

I suspect that they do not know where the aptamers bind on the proteins of interest and this is not readily determinable from the aptamer sequence. It would require some kind of epitope mapping project. Or maybe there are in silico ways to map aptamer binding if the protein structure is known and the aptamer sequence provided. I believe the aptamers are available to buy in single so you can purchase them and try work out where they bind and this would be very informative for proteins like a-syn. A bit of work though. Different aptamers can also have different binding affinities which is another reason why counts may differ even if they are binding to a similar region. But you are right in that all we have at the moment are ideas as to why they differ, but not proof.