How can ChatGPT help the research?

Have you been using ChatGPT or more generally LLM (Large language model)? I am really surprised by its natural response when I first tried it. Although LLM itself doesn’t “Understand” what it is doing, its output makes a lot of sense and it is much better than I am at some (or actually many :sweat_smile:) tasks.

Apparently the LLM’s abilities are soaring as the model size gets bigger. Newer, larger, and better models keep coming.1

I’d like to discuss how we can utilize LLM in our research activities. I would like to share my experience using it for data procession/coding:
ChatGPT4 can read the data and can do various things, such as summarizing, creating tables, and plotting. In one example, I had data on education levels in Germany. I asked ChatGPT to create a mapping dictionary for the equivalent US education level in our data dictionary and to generate a cross-tab for the original and the recoded values. The following are the outputs and I’m very impressed.


Now, I am wondering if it is possible to load Fox Insight data, PPMI data, AMP-PD data into LLM and get deeper insight from data by “communicating” with the LLM based on these data.

Another thing I’ve considering where LLM may be useful is in the data collection. It can ask for participants’ medical history etc and convert it to a structured data.

Anyway, I am really interested in your thoughts on how we can use AI/LLM and also your experiences if you have been using them already.

4 Likes

I really like this example of wrangling and organizing data. Thanks @hirotaka !

I’ve found that the utility really depends on the prompt and the topic. I’ve played around a bit with having ChatGPT3 provide snippets of code and that has been very hit or miss, but that may change in the future. It’s been great for helping me generate regular expressions (sometimes I forget syntax going back and forth between bash, python, and R). And specific data wrangling examples like the above are great with data restructuring and organization.

One other use case that we’ve been discussing is using ChatGPT to help draft and outline writing documents. For papers, we’ve used it to help generate titles, and we’re exploring using ChatGPT to write a discussion draft for a paper based on the paper contents. We had an intern test out this use case with ChatGPT4 last summer and the ChatGPT4 written discussion sections matched pretty closely with what authors wrote on her test set of papers. She also tested out summarizing scientific data at a 5th-8th grade level for sharing with the public, say a press release summarizing a paper or research topic that you want to highlight, and it worked fairly well for this as well. Of course in both cases it required a human to edit the final draft and check for factual errors.

This brings up another question… are there any data use policies in place at Fox or others around LLM usage? @jgottesman do you know? There is a lot of utility but there are evolving discussions around IP and clinical/patient data since it collects your prompts.

1 Like

I’ve used ChatGPT (especially ChatGPT4) for a variety of reasons and I think it enhances my data analysis and research capabilities, although I think we have to be cautios with its output and we need to know what we are doing in order to make sense of what we are receiving. Some people say that “prompting is an ability of the future” and I agree with it. In order to best use this tool, we need to know what to ask of it.

The first reason is related to code in a programming language in which I’m less knowledgeable about. For example, I consider myself to know a lot more of Python than R and, when a package I need is missing from Python, I need to use R, so asking ChatGPT what to do helps a lot in this matter.

I also use it to improve my scietific writting. Being a non-native English speaker sometimes makes me have to think twice on what I’m writting and any possible mistakes and typos I’m making, Additionnally, I use it to check if my phrases are clear and concise.

Regarding the question on if we can feed ChatGPT tables and they can analyse it, yes, it can. In a recent update, they added the “Code Interpreter” or “Advanced Data Analysis” tool as presented below.

I think you need to have a paid subscription and enable this feature in the settings category. Here, you can give to ChatGPT a table and ask it to make something out of it. Even though I could have done it myself, I tried using this feature for the first time in my post introducing PPMI and it did a very good job. I gave to ChatGPT a table extracted from PubMed containing the number of publications related to PPMI in each year and asked it to generate the graphs you are seeing in the post (including the one based on prediction). It shows you the code it is generating to respond to your query so everything is clear.

3 Likes

Thank you @danieltds and @ehutchins for sharing your use cases.

LLMs are a black box. It seems very important how we primer our questions (prompting) - “summarizing scientific data at a 5th-8th grade level for sharing with the public” is a good one! Also, it sounds to me a great idea to have ChatGPT help writing a discussion. @ehutchins , are you going to write a peper for it? I am interested in knowing further like what kind of tricks work for getting a good discussion.

I am also curious if there are any discussion regardiing the use of ChatGPT for MJFF data, @jgottesman?

@danieltds, you are an expert! Thank you for sharing how to activate it and use it. Yes, I recently became a paid member of ChatGPT and start playing with it. Very fun!

2 Likes

Very much agreeing with @ehutchins and @danieltds on how I have used them before!

I often use CHAT-GPT because apparently indentation is my worst enemy, and I either ask for snippets of code or I try “Please check for errors in this code”.

Also, as many of you, English is my second language and I really need validation while writing, so I will use prompts as “Please suggest improvements for this conclusion, please highlight any grammatical errors in this text” or similar prompts.

I am amazed on the rest of the possibilities as they never occurred to me and I am also very interested in new policies regarding its use. Thanks for bringing in this conversation, @hirotaka!

3 Likes

I also recently discovered you can use the Wolfram Alpha Chat-GPT plugin and have the power of LLM and data visualization + complex calculations done in seconds.

I have watched couple videos since and I am honestly amazed.

1 Like

Thanks for adding this info, @paularp! Really liked to know that!

We’re currently playing around with using ChatGPT to help right bits of papers. I have a paper I’m writing now and it might be a test for getting help from ChatGPT writing a discussion draft that I will then edit, of course. I’ll let you know how it goes!

2 Likes

Please tell us how is this moving forward. I also use it a lot to try to get an idea better explained, or written in English, as @danieltds being a non native speaker. But I don’t know if this is too obvious, but as in coding, you have to ask things step by step, rather telling it the whole thing that you want to be done. It get’s stucked if your order is too complex, but when you give it by chunks, it works great.
Just be careful, and hope no one of you gets in this type of problem

5 Likes

I’ve piloted the same, but more to help paraphrase or streamline a collection of sentences into a more cohesive paragraph that then inspires me to edit further. I’m a ChatGPT stan atm

3 Likes

Awesome! Glad to know it’s going well for you @fbbriggs. I’m still messing around with it. Definitely a good use during the editing process.

Definitely a great tool during the editing process - I’ve found the output still needs to be edited but it can be helpful for brainstorming when you’re stuck. @psaffie I think you have a great point there - good reminder not to just copy/paste the output, we still need to read it. I like your perspective too, could be a great tool for non-native speakers to help with wording. Or even native speakers that are stuck!

1 Like

@hirotaka sorry I missed your earlier tag. A note for everyone: using MJFF data to train LLMs is forbidden by the data use agreements which prohibit redistribution of individual level data – same thing with using ChatGPT. While it might be possible to do this through a funded study, individual users cannot of their own accord take this step.

For MJFF-funded work, I’m not aware of (nor was I able to find) any specific projects utilizing LLMs, but we have supported broader machine learning work in the past, most recently this project: Using Machine Learning on Longitudinal Data from Multiple Cohorts to Determine Biological Subtypes and Prediction Scores for Parkinson’s Disease | Parkinson's Disease

My sense is that there is not a ton of appetite for funding this type of work at present.

3 Likes

Josh I think I probably haven’t cross the line, nor violated the agreement, but could you expand a bit on this? I want to make sure for the future research, that I won’t cross any line!

Hi Paula, since many of these LLM/AI platforms require you to deposit a copy of the data elsewhere for processing, this would be considered redistribution of individual level data which are prohibited under the data use agreements for MJFF data. Further, since MJFF would not be aware of this and therefore not able to screen to ensure it is fully isolate and therefore not pose a risk to data security/privacy.

Let me know if you want to chat more separately! :slight_smile:

I just wanted to add my two pence/two cents worth on ChatGPT and LLMs in general. In my experience they are a good resource for gathering background information and generating draft narratives, but they are unreliable and must be used with caution, particularly in scientific work . I see 4 issues with them:

  1. They often give incomplete responses. For instance, if you ask for a list of all prescription drugs used to treat PD, they will likely miss a few.
  2. They sometimes give incorrect information (this is known as “AI hallucination”). Essentially they are built on probabilistic models drawing from multiple data sources and sometimes this leads to false statements.
  3. They are trained on out-of-date information e.g. GPT 3.5 is trained on data up to January 2022.
  4. They are typically unable or unwilling (I think by design) to give estimates and opinions on things.

So I think they are a useful tool for generating supplementary information and drafting text, but aren’t a substitute for a human who actually understands the topic at hand.

Jodie

As an aside they are great for fun stuff. For example, ask ChatGPT to “Write a summary of the Michael J. Fox Foundation in the style of a US presidential inauguration speech” and you will get things like:

“…Founded by one of America’s most beloved sons, Michael J. Fox, this remarkable institution stands as a beacon of light in the fight against Parkinson’s disease—a scourge that has afflicted millions around the globe. With unwavering resolve and unwavering dedication, the Michael J. Fox Foundation has tirelessly pursued its noble mission: to accelerate breakthroughs in Parkinson’s research and empower those living with the disease to lead lives of dignity and purpose…
As we embark upon this new chapter in our nation’s history, let us draw inspiration from the example set forth by the Michael J. Fox Foundation. Let us stand together in solidarity, united in our determination to conquer Parkinson’s disease once and for all. And let us pledge ourselves to the noble cause of advancing science, fostering empathy, and building a future of boundless possibility for all.” etc.

6 Likes