Question on analysis for some Fox Insight data

Hello everyone,

We are currently working with the raw CSV exports dataset from the Fox Insight and I was hoping to get some support on how to handle certain questionnaire variables represented in the data files.

1. Sometimes variables appear to be absent from the CSV export even though response options are listed in the questionnaire documentation (pdf questionnaire file). Specifically, if a response option receives no selections from participants, is the corresponding variable excluded from the CSV entirely? We are trying to determine whether missing columns indicate that the option was never selected or whether the variable was removed during export. Are variables with zero responses excluded from the export, or do these items appear under different variable names?

2. Structure of “Select All That Apply” (SATA) variables- For questions marked Select All That Apply (SATA), the CSV files appear to include one column per response option, but there is no column representing the question itself. Is there a recommended way to aggregate these variables back to the original question level? Additionally, is there a data dictionary or mapping resource for SATA questions that links each variable to its questionnaire item and response option?

Any clarification or support anyone can provide would be much appreciated.

thanks

Roberta

Hi @betamaro,

  1. Variables would not be excluded/removed, nor will they appear under a different variable name. Variables for unanswered questions would have a blank value, which represents the question not being answered (i.e., because they didn’t finish the visit, survey logic dictating that a condition serving the question to the participant is not met, or they skipped it if that was an option).
  2. I’m not aware of a specific method for aggregating these variables, but if you let us know some of the specific ones you’re working on we can try to connect you with other researcher in the community who may have experience doing so in the past. There are two versions of the data dictionary (machine readable and annotated with a bit of guidance) as well as the questionnaire forms themselves on the Resources page on the Fox DEN website which hopefully be helpful! Another resource would be this data wrangling guide.

Hope these help, definitely let me know if you have any follow-up questions!

Josh

Hi Josh,

Thank you for the response.

  1. In the PPPM survey, the annotated dictionay has listed 1-7, however in the csv, 3 (Neutral), 6 (Not sure), and 7 (PNTA) is not even stored in the FoxInsightValue.csv.

  2. Another example, there are 11 values listed here for the question Which type of birth control have you used since your diagnosis of Parkinsons, However it only shows 7 of the 11 values in not just the FoxInsightValues.csv but also the MJFF DEN.

  3. I have found the Annotated Dictionaries. Is there a way they can be saved as a .csv?
    However, the Experiences of Women Living with PD - Fertility, Pregnancy, and Childbirth questionnaire is not listed there. The “Select All That Apply” (SATA) questions, exported from MJFF DEN data contain only the individual response-option variables, but we have not found a resource (in csv format, discluding the Annotated dictionaries) that maps those variables back to the parent questionnaire item. As a result, reconstructing the original questionnaire structure requires manually linking each response-option variable to its corresponding question.

There are many more occurences of values not showing up, is it because they were simply not answered? How would you go about this in data analysis?

Thank you for your help.

Aleksandra Zelatis

In addition, I am running into an issue with how some of the SATA variables are structured.

For example, there are cases where answer choices such as “Tampons”, “Pads/Feminine Napkins”, etc. share the same prefix as the variable below that belongs to a separate single-answer question (FemMenoPreProDif). Because they are grouped under the same prefix, it is unclear how these responses should be coded and reconstructed into their original survey question.

A second issue involves the FemMenoPrePD symptom variables (and the corresponding Peri and Post menopause sections). The symptoms appear to be split across three separate SATA questions (movement symptoms, thinking/feeling and sleeping symptoms, and related symptoms), yet all variables share the same prefix (e.g., FemMenoPrePro). As a result, there is no obvious way to determine which variables belong to which symptom group based solely on the variable names.

How would you recommend separating these variables into their appropriate question groups when the naming convention does not distinguish between them?

Furthermore, is there a way to distinguish between a response that is missing because the participant was not presented with the question (due to survey skip logic) versus a response that is missing because the participant chose not to answer or left the question blank?

In other words, how can we identify true missing values in the dataset? Are there variables, metadata fields, or survey logic documentation available that would allow us to differentiate between “not asked,” “not applicable,” and genuinely missing responses?

How would you recommend handling these cases during data analysis?

Additionally, some response/value options appear to be absent from both the dictionaries and the raw data. In these cases, should we assume that the response option received zero selections across the entire survey population, or zero selections among only those participants who were eligible for and presented with that question?

More generally, is there a way to determine the denominator associated with a response option? For example, if a particular answer choice does not appear in the dataset, how can we determine whether:

  1. No participants selected that response,

  2. The response option was never presented due to survey logic, or

  3. The response option was excluded because it received zero responses?

Understanding this distinction is important for accurately calculating frequencies and percentages during analysis.

Thank you,

Aleksandra Zelatis