We want to introduce the Data Wrangling Guide Task Force, led by myself and @malosco, to the @DCoP_Innovators and invite anyone to join. We would like the task force to be collaborative - we have met and identified goals and are open in input from the community. Here is a link to the task force guidelines.
The purpose of this task force is to create a data wrangling guide for PD datasets, a “Don’t panic!” button if you will, to give users a starting off point when approaching these data. This can be a combination of best practices and practical information. Our thought was that we would include a general section about approaches and best practices, along with some tips and tricks specific to the quirks of different datasets. It would be helpful to have people join from a mix of backgrounds - people at different levels of data analysis and familiarity with various datasets (Fox Insight, PPMI, PDBP, BioFIND, GP2, AMP PD, and any others you may want to include). The goal is to have an end product describing the nuts and bolts of accessing and utilizing data that we can share with the community.
We are looking for around 5 people to join the task force. There will be monthly meetings to check on progress. Our ideal timeline looks like:
end of October: team identified
late Nov/early Dec.: first meeting, roles assigned to team members
Jan. - March: work on end product
March/April: present to DCI meeting for feedback
We will discuss this at Friday’s meeting as well. If you are interested, please respond to this post and/or reach out to me or @malosco by Friday, October 27th.
As always we’re open to input from the community and any suggestions you all may have!
Hello! I’m not sure if this falls within the scope of this initiative (which I believe to be very good and something I might be able to contribute to), but does your idea also involve producing content that details the types of data available in these databases? Something to help people understand what’s in them even before learning how to access them?
This was the main idea I had in mind when considering writing an article highlighting datasets in PD. In this article, I envision writting a summary guide based on a table containing information such as: the institution that created or maintains the dataset, the year data collection started, the number of PD patients, the number of controls, the number of prodromics, years of longitudinal follow-up, and types of data available (clinical - motor, quality of life, sleep, cognition, autonomic, etc., genomics, transcriptomics, proteomics, lipidomics, magnetic resonance imaging, DatSCAN, serum biomarkers, urinary biomarkers, etc.).
Would this be within the scope of this Task Force? Or would it be a topic for a separate Task Force?
Thank you @ehutchins! I also wanted to chime in and share my enthusiasm and thoughts on this new Task Force with the @DCoP_Innovators. Our goal is for this to be an accessible and useable resource for the DCI members and the wider research community. We aren’t doing this task force for the sake of doing it! It will be a guide that someone can pick up and then get to work without (or with limited) additional handholding. It will be an evolving resource and by no means will we have a polished, finished product. We will create the first template, present the resource at a monthly DCI meeting, and then ideally create some type of living document that can be disseminated and refined moving forward.
To accomplish our goals, we really need others who have background in the various data sets. Elizabeth and I have experience with Fox Insight and AMP PD. It is also possible that we select one data set to focus on for this initiative and this will serve as the template for additional data sets. We can decide this at our first meeting in Nov/Dec. To this end, @danieltds, yes, one objective is to delineate the nuances and details of the various data sets. It sounds like we can count you in here??
Who else can join us?? Let us know! Feedback from all is welcome!
I couldn’t join the meeting but it is such a great thing. Thank you for taking an initiatives! I am happy to join or be the first user of the guideline!
Today, we reviewed the goals, assigned roles, and assigned tasks. The goals are to develop a quick how to guide for various data sets relevant to the PD community. The guide is intended to be the starting point to potential users—is this the right resource I need for my question and what do I need to know about this resource?
We identified the following data sets to focus on: PPMI, AMP-PD, GP2, and Fox Insight.
A potential end product is to create an evolving written guide that describes the following for each data set: Description (cohort, recruitment goals), Features (types of data available), Data set access: How to?, Intended use of data set, Strengths of data set, Limitations of data set, Links to preexisting documentation, Data set quirks, FAQs (e.g., questions you think others will have).
We are still deciding on what the end product will be and the exact information we want to include for each data set. But, we are well underway. We will reconvene end of January and hope to present our resource to the community in late Spring 2024.
As an update:
During the second data wrangling guide meeting (January 24th, 2024), we discussed the format of the template that will be applied across datasets. Over the next three weeks, members will work on applying the template to their assigned dataset (AMP - PD, Fox Insight, PPMI, GP2).
We will meet in February to show our work and discuss any questions/issues that arose. More to come!