Fragmentation Method Effects on Size Selection Needs

Fragmentation of nucleic acids prior to library construction is needed for most next-generation sequencing platforms. Methods for fragmenting genomic DNA vary in their ability to focus sheared nucleic acids to a tight average fragment size. Broadly distributed shearing profiles are obtained when using probe-based shearing instruments, whereas ultrasonication instruments result in much more controlled, tighter shearing profiles. DNA-seq chemistries, including the NEXTFLEX® Rapid DNA-seq kit 2.0, which do not use enzymes to fragment the DNA will depend on these methods to pre-fragment DNA prior to library prep.  In recent years, enzymatic fragmentation modules have become coupled with downstream library prep to offer a more convenient, automation-friendly offering for labs needing high-throughput solutions. The NEXTFLEX® Rapid XP DNA-Seq kit includes a one-step fragmentation, end-repair, and adenylation offering the users a quick turn-around-time with less touch points minimizing potential user errors in the library prep setup. Enzymatic fragmentation can be a sensitive process, and the NEXFLEX® enzymes have been shown to be more efficient and reliable to some others on the market. Highly variable shearing profiles of starting material and limited sequencing read lengths leave the researcher with an important question of whether or not to size select NGS libraries.

Read Length Considerations

Read lengths play an important role in determining if size selecting NGS libraries is necessary. Kits such as the NEXTFLEX® Rapid XP DNA-Seq kit easily offers this flexibility and accommodates a wide range of fragment sizes by modulation of fragmentation time. If starting with a broad shear profile (100 – 1,500 bp) and performing 2×150 reads, it would be advisable to size select 300 – 400 bp or 350 – 500 bp, post-ligation. This strategy would ensure maximum coverage of most inserts. If size selection is not employed in this scenario, many higher molecular weight molecules will not be sequenced deeply, resulting in non-uniform genome coverage.

Starting Material: Low Quality DNA

Formalin-fixed, paraffin-embedded nucleic acids can be highly degraded and fragmented, a consequence of preservation. If starting with sub-nanogram quantities of low-quality DNA, size selection would not be advised due to the limited number of amplifiable DNA molecules going into PCR, which could result in greatly reduced library yield.

Starting Material: cfDNA

Cell-free DNA (cfDNA) is another interesting sample type which is naturally fragmented and does not require additional fragmentation in most cases. The NEXTFLEX® Rapid DNA-seq kit 2.0 can be coupled with the NextPrep-Mag™ cfDNA isolation kits to enable a complete extraction-to-library prep workflow for reliable sequencing results from cfDNA. The NEXTFLEX® Cell-Free DNA-Seq kit 2.0 includes the chemistries of the Rapid DNA-seq kit 2.0 with an additional upstream protocol allowing users to select for mono-nucleosome or mono-, di-, and tri- nucleosome populations within the sample prior to library prep.

Starting Material: High-Quality DNA

If DNA is not a limiting factor and many barcoded samples are being processed in parallel, size selection is highly recommended. Size selecting a specific region of a broad range shear is advisable only if starting with ≥ 10 ng of DNA.  If size selection is not performed in this scenario and barcoded libraries are pooled and loaded onto the flow cell for cluster generation, samples containing lower molecular weight DNA will be preferentially amplified via clustering, increasing the number of reads this sample receives and decreasing the number of reads received by other, longer libraries. Size selection helps ensure each library gets similar reads or coverage when multiplexed into the sequencing pool. In the context of cfDNA applications, in most cases, an enrichment is expected to occur during extraction from plasma/serum to exclude the contaminating high molecular weight gDNA, so the libraries will selectively be enriched for true cfDNA populations prior to library prep. As such, some labs choose to forgo size selection after libraries are generated.

Size Selecting Peak of Shear

If starting with a Gaussian shear profile ranging from 150 – 600 bp, size selection from 300 – 400 bp would be recommended. This method of selection would produce the highest yielding libraries because this area harbors the largest number of viable sequencing molecules, and thus more successful ligation events and higher yield libraries. Conversely, if selecting outside the peak of the shear, library yields will be lower due to a decrease in the number of viable molecules.