Data Wrangling Guide Task Force

We want to introduce the Data Wrangling Guide Task Force, led by myself and @malosco, to the @DCoP_Innovators and invite anyone to join. We would like the task force to be collaborative - we have met and identified goals and are open in input from the community. Here is a link to the task force guidelines.

The purpose of this task force is to create a data wrangling guide for PD datasets, a “Don’t panic!” button if you will, to give users a starting off point when approaching these data. This can be a combination of best practices and practical information. Our thought was that we would include a general section about approaches and best practices, along with some tips and tricks specific to the quirks of different datasets. It would be helpful to have people join from a mix of backgrounds - people at different levels of data analysis and familiarity with various datasets (Fox Insight, PPMI, PDBP, BioFIND, GP2, AMP PD, and any others you may want to include). The goal is to have an end product describing the nuts and bolts of accessing and utilizing data that we can share with the community.

We are looking for around 5 people to join the task force. There will be monthly meetings to check on progress. Our ideal timeline looks like:

  • end of October: team identified
  • late Nov/early Dec.: first meeting, roles assigned to team members
  • Jan. - March: work on end product
  • March/April: present to DCI meeting for feedback

We will discuss this at Friday’s meeting as well. If you are interested, please respond to this post and/or reach out to me or @malosco by Friday, October 27th.

As always we’re open to input from the community and any suggestions you all may have!

9 Likes

Hello! I’m not sure if this falls within the scope of this initiative (which I believe to be very good and something I might be able to contribute to), but does your idea also involve producing content that details the types of data available in these databases? Something to help people understand what’s in them even before learning how to access them?

This was the main idea I had in mind when considering writing an article highlighting datasets in PD. In this article, I envision writting a summary guide based on a table containing information such as: the institution that created or maintains the dataset, the year data collection started, the number of PD patients, the number of controls, the number of prodromics, years of longitudinal follow-up, and types of data available (clinical - motor, quality of life, sleep, cognition, autonomic, etc., genomics, transcriptomics, proteomics, lipidomics, magnetic resonance imaging, DatSCAN, serum biomarkers, urinary biomarkers, etc.).

Would this be within the scope of this Task Force? Or would it be a topic for a separate Task Force?

Congratulations on the initiative!

4 Likes

Thank you @ehutchins! I also wanted to chime in and share my enthusiasm and thoughts on this new Task Force with the @DCoP_Innovators. Our goal is for this to be an accessible and useable resource for the DCI members and the wider research community. We aren’t doing this task force for the sake of doing it! It will be a guide that someone can pick up and then get to work without (or with limited) additional handholding. It will be an evolving resource and by no means will we have a polished, finished product. We will create the first template, present the resource at a monthly DCI meeting, and then ideally create some type of living document that can be disseminated and refined moving forward.

To accomplish our goals, we really need others who have background in the various data sets. Elizabeth and I have experience with Fox Insight and AMP PD. It is also possible that we select one data set to focus on for this initiative and this will serve as the template for additional data sets. We can decide this at our first meeting in Nov/Dec. To this end, @danieltds, yes, one objective is to delineate the nuances and details of the various data sets. It sounds like we can count you in here??

Who else can join us?? Let us know! Feedback from all is welcome!

8 Likes

Very excited you are doing this! There is lots of data out there, so would be great to have this type of resource. I can join!

6 Likes

Yay! Very excited for this idea and it joining the one previously @danieltds described. Please count me in

4 Likes

Thank you @vdardov and @paularp! You are in!

3 Likes

Thanks for the explanation, @malosco. I want to join!

4 Likes

Awesome, thanks @danieltds ! Happy to have you.

2 Likes

Hi, Great initiative!
I could participate if needed. My experience is in Terra analysis of GP2 and AMP-PD
thanks for the opportunity

2 Likes

Wonderful. Thank you @psaffie !

2 Likes

I couldn’t join the meeting but it is such a great thing. Thank you for taking an initiatives! I am happy to join or be the first user of the guideline!

3 Likes

Awesome. Thanks @hirotaka , you’re a great addition!

1 Like

Thanks all for commenting! Next step is to schedule the first meeting. @ehutchins and @Malosco sent me their availability.

@danieltds @vdardov @paularp @psaffie @hirotaka Please vote in the poll below:

  • Monday, 20 November at 17:00 UTC (9am PST/12pm EST/11am Mexico City/2pm Brasilia)
  • Wednesday, 29 November at 18:00 UTC (10am PST/1pm EST/12pm Mexico City/3pm Brasilia)
  • Monday, 4 December at 16:00 UTC (8am PST/11pm EST/10am Mexico City/1pm Brasilia)
  • Wednesday, 6 December at 18:00 UTC (10am PST/1pm EST/12pm Mexico City/3pm Brasilia)
  • Wednesday, 13 December at 18:00 UTC (10am PST/1pm EST/12pm Mexico City/3pm Brasilia)
  • None of these options work for me
0 voters

@malosco @ehutchins @danieltds @vdardov @paularp @psaffie @hirotaka Thanks for voting! I just sent a calendar invitation to save the time of Wednesday, December 13th at 18:00 UTC.

1 Like

Hi @DCoP_Innovators

I wanted to provide an update on the progress of our task force. @ehutchins and I have met and we held our first meeting with the entire task force today which includes @danieltds @lmackenzie @paularp @psaffie @hirotaka @vdardov @ykfarhan @jgottesman.

Today, we reviewed the goals, assigned roles, and assigned tasks. The goals are to develop a quick how to guide for various data sets relevant to the PD community. The guide is intended to be the starting point to potential users—is this the right resource I need for my question and what do I need to know about this resource?

We identified the following data sets to focus on: PPMI, AMP-PD, GP2, and Fox Insight.

A potential end product is to create an evolving written guide that describes the following for each data set: Description (cohort, recruitment goals), Features (types of data available), Data set access: How to?, Intended use of data set, Strengths of data set, Limitations of data set, Links to preexisting documentation, Data set quirks, FAQs (e.g., questions you think others will have).

We are still deciding on what the end product will be and the exact information we want to include for each data set. But, we are well underway. We will reconvene end of January and hope to present our resource to the community in late Spring 2024.

Would love your feedback and suggestions!

1 Like

Thanks, Mike! It was great to see this get underway.

@malosco @ehutchins @paularp @psaffie @hirotaka Please fill out this doodle poll so that I can schedule us a follow up meeting for late January.

I hope everyones year is wrapping up nicely! Happy holidays! - Laura

As an update:
During the second data wrangling guide meeting (January 24th, 2024), we discussed the format of the template that will be applied across datasets. Over the next three weeks, members will work on applying the template to their assigned dataset (AMP - PD, Fox Insight, PPMI, GP2).

We will meet in February to show our work and discuss any questions/issues that arose. More to come!

-Laura

1 Like

Hey all! Scheduling our next meeting.

@hirotaka @paularp @psaffie @danieltds @vdardov, please vote on a date for the third meeting below.

  • February 20th at 4pm EST
  • February 27th at 3pm EST
  • February 28th at 12pm EST
  • February 28th at 1:30pm EST
  • None of these times work for me
0 voters

Also linking the Data Wrangling Guide document here for ease of access.

For me, I am available only Feb 28, so, if possible, please vote for this date

1 Like

Hi all, I’ve schedule the meeting for February 28th at 1:30 Eastern Time. I will update the calendar invitation with an agenda + action items.

See you then here: Join conversation!

2 Likes