Ask Me Anything: Answering Your Questions to Ensure Effective Performance of DNA Extraction for Real World Sequencing Studies

Hi @DCoP_Innovators and other community members!

I’m excited to kick off our next Ask Me Anything (AMA) for the Data Community of Practice!

I’m Dr. Paula Saffie Awad, a movement disorders neurologist and researcher focused on the genetics of Parkinson’s disease and rare neurodegenerative disorders. During my PhD, I faced a major challenge: DNA extraction data-challenge . Only ~30% of our samples passed QC for short-read sequencing, and none were initially usable for long-read. Similar issues came up across other GP2 sites in Bangladesh, Southern Brazil, and Southern Chile.

:warning: I’m not a wet lab person—so I’ll be sharing my lessons learned the hard way: what went wrong, what worked, and how we got high-quality DNA for long-read sequencing in tough settings.

:date: I’ll be answering your questions on Tuesday, April 29
:spiral_calendar: Please submit your questions in this thread by Friday, April 25

Some examples of questions might include:

  • What type of biospecimen is best for a given type of analysis?
  • What would be a good place to start to ensure samples pass QC?
  • What are some common mistakes that can affect DNA quality?
  • What steps do you take to avoid losing DNA during freezing or thawing?
  • What tools or methods do you use to check DNA quality?
  • What do you consider when shipping DNA to another lab?
  • What extraction kits or methods have worked best for you, especially if you’re not a wet lab expert?

Looking forward to sharing tips, failures, and lessons learned—and learning from your experiences too!

Paula

7 Likes

Hi Paula! Thanks so much for this great post. Here are some questions that I am interested in:

  1. In field settings with limited infrastructure (potentially some GP2 sites), what biospecimen collection protocols work best and are there specific resources you recommend?
  2. Were there differences in DNA yield or quality between blood, saliva, or other biospecimen types in the regions you worked in?
  3. What advice would you give to other neurologists or clinicians interested in engaging with large-scale genetic research without a lab background?

Thanks so much for your help!

Gian

2 Likes

Hi Paula!
I’ve had a lot of experience extracting DNA, but never for long-read sequencing.
How can you know if your available DNA is suitable for long-read sequencing? Is there any integrity tests (such as we do electrophoresis for short reads)?

Thanks for opening this space!

1 Like

Hi Paula!

I have a very simple question, from the perspective of a site that sends DNA to be extracted and analysed in a different facility. How often you faced any problems with a potential mislabelling of samples you received from those sites? And what are your suggestions to avoid that?

1 Like

Hey Paula,

This is a great post. What was the reason that such a low percentage of samples passing? Is it due to the collection methods, shippng methods, etc. really curious to hear how these things can alter the quality of the sample. Very interested to hear about your lessons learned!

Thanks @psaffie! What advice do you give to people who don’t have training in the use of genetic data but want to integrate it into their research particularly from existing data sets? More specifically, what bad practices do we want to be sure that we avoid?

1 Like

Hi @psaffie,
I was wondering what is your take on what are the most critical steps and common pitfalls in DNA extraction protocols that can impact the quality and yield of DNA for real-world sequencing studies. And how can this best be standardized across labs?

This is a great topic! Thank you for hosting an AMA. And I’m so glad you mentioned long read data too.

  1. What are some ways to optimize yield/quality for both short read and long read preps?
  2. Any suggestions for standardization methods that help reduce batches across isolation/sites so that downstream data can better be compared to other cohorts, etc.? Thinking of a prior discussion about what makes useful metadata.
  3. Suggestions for reducing the human error side of things in general? This is more of a meta/open-ended question, thinking about some prior discussions here about detecting sample swaps - if we can prevent them, even better! I wanted to highlight your QC comment here as I think it’s important to continue to have a an ongoing discussion about QC and preventing sample swaps.

Hi Gian, thanks so much for your great questions! Here are some brief answers based on my basic experience:

:test_tube: What biospecimen works best in limited-resource settings?
Fresh EDTA blood is the most reliable option — higher DNA yield, better purity, and easier to standardize than saliva. It’s especially important in PD, where saliva collection can be difficult.

:package: Any recommended protocols?
Yes! We used a simple, cost-effective protocol, that even I could do, based on the QIAamp DNA Blood Midi Kit that works well in low-infrastructure settings. You can find the full step-by-step version here: :backhand_index_pointing_right: DNA Extraction from Whole Blood

:chart_increasing: Do blood and saliva samples differ in DNA yield or quality?
Yes — blood yields are consistently higher and more stable. Saliva often results in lower DNA quality and is more prone to pipetting loss due to small elution volumes (usually 100 µL).

:brain: What advice for clinicians without lab training?
Outsource extraction to a lab with a validated protocol, check it, and ask for QC metrics for each samples. Focus your effort on collecting the samples in an organized maner, like collecting all the samples the same day, and ensuring blood is processed within 48–72 hours of collection.

Hi Victoria, thanks for the great question!

Honestly, it was a combination of small errors from the beginning that added up:

  1. We started with in-house extraction, using the Mini kit. The concentrations looked great on Nanodrop, but what we didn’t realize was that the elution volume was very low — around 100 µL. So while the DNA appeared highly concentrated, the total yield was very low due to the small elution volume. After shipment and handling, many samples had no DNA left at all.

  2. We didn’t do proper quality control after the first few batches. The person doing the extraction assumed everything was fine and stopped checking. We also relied on Nanodrop, which tends to overestimate concentration. Later we switched to Qubit, which is much more accurate.

  3. When we externalized extraction, we made the mistake of shipping frozen blood, and it was thawed without care. Many tubes broke, and the ones that “survived” had very bad DNA quality.

Biggest lesson: even if the concentration looks perfect, if the elution volume is too small, there may not be enough DNA to survive transport and processing. It’s much safer to elute in a larger volume and sacrifice a bit of concentration for stability.

Hi Paula! Great question — the requirements for long-read sequencing are definitely more demanding than for short-read protocols.

To determine if your DNA is suitable for long-read sequencing (e.g., Oxford Nanopore or PacBio), you’ll want to look at three key aspects:

1. Quantity
You need a high amount of DNA, typically >3–5 µg for a library, but it’s safest to extract ≥30 µg total so you have room for QC, repeats, and library prep losses.

2. Integrity / Fragment Length
Unlike short-read prep (where even fragmented DNA works), long-read sequencing requires high molecular weight (HMW) DNA.
You can assess this with:

  • Femto Pulse (Agilent) or TapeStation Genomic DNA kit → these tools give you a detailed DNA size profile.
    Ideally, your peak fragment size should be in the 20–30 kb range or higher.

3. Size Selection & Shearing

  • For some protocols, especially targeted applications or ultra-long reads, you may need to perform size selection (e.g., using BluePippin) to remove small fragments.
  • Conversely, if your DNA is too large (e.g., peaks >50 kb), you may need to shear it down to ensure uniform loading and efficient library prep.

Here are some slides with our specific metrics and QC results to illustrate these points clearly.

2 Likes

Hi Mike, thanks for this question that aligns with the training and mentoring @TaskForce. For researchers new to using genetic data, especially from existing datasets, my main advice would be:

1. Seek Training and Mentorship:
Try to access structured training resources, such as online courses or workshops (e.g., GP2 courses), or collaborate closely with someone experienced in genetics. It helps enormously to have guidance early on.

2. Use Existing Protocols and Guidelines:
Follow established workflows, published protocols, and standardized QC guidelines. Don’t reinvent the wheel—adapting proven methods reduces mistakes significantly.

3. Understand the Data Origin and Limitations:
Always read the metadata carefully—know exactly how the genetic data was generated, the populations involved, and what QC was performed. Misunderstanding the dataset’s limitations is a common pitfall.

Bad practices to avoid include:

  • Treating genetic data as a “black box”: Always question and understand your input data rather than using it blindly.
  • Skipping thorough QC steps: Quality control is essential, not optional. Poor QC can invalidate your results (as happened to me).

Hi @AmgadDroby — excellent question!

From my experience, the most critical steps affecting DNA quality and yield in real-world sequencing studies are:

:key: Critical Steps:

  • Proper Sample Handling: Blood samples must be stored at 4°C and processed within 48–72 hours. Avoid freezing and thawing, which significantly reduces DNA quality.
  • Elution Volume: A small elution volume may give high DNA concentration but makes the sample vulnerable to pipetting errors or loss during handling.
  • Accurate Quantification: Always use Qubit instead of Nanodrop, since Nanodrop frequently overestimates DNA concentration.

:warning: Common Pitfalls:

  • Not consistently performing QC checks after initial batches, leading to unnoticed errors later.
  • Relying on a single extraction per sample—if something happens to that extraction, the sample could be irretrievably lost.
  • Protocol inconsistency across technicians or labs, causing variability in results.

:white_check_mark: Practical recommendation:

Whenever feasible, perform two separate extractions per sample, rather than simply splitting one extraction into multiple aliquots. Keep one extraction stored securely at your workplace as a backup in case the other is compromised during shipment or handling.

1 Like

Hi Daniel, great and very important question — especially in multi-site projects.

We’ve encountered mislabeling in about 1–2% of samples, usually detected later during sex QC after sequencing, when the reported sex didn’t match the genetic data.

To minimize this risk, we recommend:

  • Using a manifest sheet with unique study codes and verifying that all tubes match the manifest before shipment.
  • Keeping identifiers off the tubes — we avoid including personal data (like name or date of birth). Instead, we label tubes with a study code only, and keep a secure, separate file that links each code to the participant’s identifying information.
  • Cross-checking the manifest before and after DNA extraction to catch any inconsistencies.
  • Performing two extractions per sample, when possible — this provides a backup if one fails due to degradation or poor QC.

That said, the most common issue we faced wasn’t mislabeling, but blood being frozen before extraction, which often caused tube breakage and poor DNA yield and purity. Whenever possible, blood should be stored at 4°C and processed within 48–72 hours. extraction.

1 Like

Hi Elizabeth — . You’re absolutely right that this conversation needs to continue — especially as more sites begin working with large datasets. I’ll use your questions as a way to summarize and close the AMA. I’ve included a summary table comparing commonly used extraction kits by sample type, input/output volumes, estimated concentration, and practical considerations like cost and risk if volume is lost.

:dna: Optimizing yield/quality for both short- and long-read sequencing:

  • Sample integrity is key: Keep blood at 4°C and extract within 48–72 hours to avoid degradation.
  • Choose elution volume wisely: Small volumes increase concentration but raise the risk of losing DNA during handling. For real-world settings, a moderate elution volume (e.g., 200–600 µL) offers a good balance.
  • Use Qubit for quantification (not Nanodrop), and Femto Pulse or TapeStation for fragment size assessment if long-read is planned.

:prohibited: Reducing human error and preventing sample swaps:

  • Pre-label tubes with unique study codes (no names or DOBs), and store the ID linkage separately.
  • Double-check the manifest before and after extraction, and flag discrepancies immediately.
  • Two extractions per sample can be a lifesaver — we now keep one in-house as a backup.
  • Regularly perform sex QC or other genomic checks to catch mismatches early.

Comparison of DNA Extraction Kits: Sample Types, Yield, Cost, and Suitability for Sequencing Applications

Sample Type Extraction Kit Input Volume Typical DNA Yield Elution Volume ~ Cost per sample LRS Estimated Conc. (ng/µL) Risk if Volume Lost Notes
Whole Blood QIAamp DNA Blood Midi (Qiagen) 2 mL 20–60 µg 600 µL $15.00 :white_check_mark: 66.7 Low Validated protocol with HMW DNA (~30–50 kb).
Whole Blood QIAamp DNA Blood Maxi (Qiagen) 10 mL 100–600 µg 1,000 µL $30.00 :white_check_mark: 350.0 Low For applications requiring large amounts of HMW DNA.
Whole Blood QIAamp DNA Blood Mini (Qiagen) 200 µL 3–12 µg 100–200 µL $4.64 :cross_mark: 37.5-75 Moderate Good for short-read; limited volume/yield for long-read.
Saliva Oragene DNA OG-600 (DNA Genotek) 2 mL ~55–110 µg 100 µL $15.00 :cross_mark: 825.0 High Non-invasive; quality varies with collection method.
Saliva Oragene DNA OG-610 (DNA Genotek) 1 mL ~40–60 µg 100 µL $13.00 :cross_mark: 500.0 High Lower volume; suitable for general genotyping.
Saliva Oragene DNA OG-675 (DNA Genotek) 0.75 mL ~17.3 µg 100 µL $17.00 :cross_mark: 173.0 High Assisted collection; yield often too low for sequencing.
2 Likes

This is great. Thanks for the AMA, @psaffie !

1 Like