What is library preparation?
Library preparation is the step of an NGS workflow in which DNA or RNA molecules are processed and prepared for sequencing. Poorly prepared NGS libraries can result in low-quality sequences, failed sequencing, or inaccurate results.
Depending on the type of sequencing being performed (e.g., whole genome sequencing, whole transcriptome sequencing, etc.) as well as the sequencing chemistry being used (e.g., Illumina®, Ultima Genomics®, etc.), the steps of a library prep workflow are slightly different. It’s important that you check your library prep protocol and make sure the library you generate will help you achieve your research objectives.
The major steps of a traditional NGS library prep workflow include fragmentation, adapter addition, and library quantification and pooling (Figure 1).
Fragmentation
Fragmentation takes place after the sample preparation step (i.e., nucleic acid extraction and quantification), and is the process in which nucleic acid molecules are randomly fragmented into a similar length. Randomization during fragmentation is important to ensure that the resulting sequencing library is truly representative of the starting sample. This step is usually required for short read sequencing instruments and the ideal read length varies depending on the instrument being used, starting material, and experimental goals. Highly degraded samples such as cell-free DNA (cfDNA), DNA from formalin fixed paraffin-embedded (FFPE) tissue, or damaged RNA may not require fragmentation prior to sequencing due to the already short fragment length.
Fragmentation can either be mechanical or enzymatic. If preforming a DNA library prep (an NGS workflow which typically requires fragmentation), either method can be used. Mechanical fragmentation or mechanical shearing is completed using a physical force to fragment DNA molecules. Examples of mechanical fragmentation approaches include focused acoustic shearing and hydrodynamic shearing. An alternative to mechanical fragmentation is enzymatic fragmentation, which as the name implies relies on enzymes to cleave DNA sequences to the desired size.
Adapters for NGS library sequencing
Once library molecules have been fragmented, NGS sequencing adapters will need to be added to the molecules. NGS adapters play multiple roles in sequencing but the most important is that they are required for your library molecules to be read by the chosen sequencing platform.
To prepare library molecules for adapter addition, an end repair step is usually needed. Traditionally, during this step, double stranded DNA (dsDNA) fragments are polished for adapter ligation. This is done either by creating blunt-ended fragments—5’ overhangs and 3’ overhangs are filled in or removed—or adding an A-tail to the 3’ ends of fragments to prepare for T-overhang adapters. Other methods of end repair rely on sequential adapter ligation to avoid chimera formation and adapter dimers. For single-stranded DNA (ssDNA), IDT offers a library prep workflow uses Adaptase™ technology which simultaneously performs tailing, and ligation. After end repair is completed, sequencing NGS adapters can be added to library molecules.
Even though the specific structure of an NGS adapter varies depending on the type of indexes being used, they are added to both sides of library fragments (Figure 2—labeled insert).
The distinct sections of an NGS adapter include:
- Flow cell binding sequences, (Figure 2, P5 and P7). These sequences are complementary to the oligos on a sequencing flow cell and they allow for the adapter to anneal to the flow cell during the sequencing run.
- Sequencing primer binding sites, (Figure 2, SP1 and SP2). This part of the adapter is where sequencing primers will bind, and the sequence will be ‘read’ and extended by a polymerase.
- Sequencing indexes, (Figure 2, i5 and i7). Indexes are sequences that are specific to each sample and enable libraries to be multiplexed. There are several different types of sequencing indexes, these are discussed in further detail below.
NGS indexes explained
Sequencing indexes are part of NGS adapters that are unique between samples (Figure 2). Indexes allow researchers to pool multiple libraries so that they can be sequenced on the same flow cell. There are two main types of indexing strategies, single and dual.
Single index (SI) sequencing involves a single index that is added on to one end of the library molecule (Figure 2, i7 index). Because there is only one unique sequence in SI adapters the level of multiplexing available for SI libraries is lower than that of libraries prepared with dual index (DI) adapters. DI libraries have adapters that contain indexes on both ends of the library molecules (Figure 2, i5 and i7), therefore the level of multiplexing available with DI libraries is much higher than that of SI libraries. There are two types of DI adapters—unique dual indexing (UDI) adapters and combinatorial dual indexing (CDI) adapters (Table 1).
For CDIs the combination of the i5 and i7 indexes are unique between library samples however the individual i5 and i7 sequences are not. For example, in a 96-well plate there could be eight unique i5 indexes and twelve unique i7 indexes. UDIs on the other hand, are distinct sequences for all i5 and i7 indexes. Looking again at the example of a 96-well plate there would be 96 unique i5 and 96 unique i7 sequence indexes. This means none of the samples in would share index sequences, making them easier to identify in downstream analyses, after the sequencing process is completed.
UDIs are recommended when a high level of multiplexing is planned for sequencing. While multiplexing can reduce costs as more samples can be sequenced on a single flow cell, this approach does come with risks. One of which is index hopping—when sequencing reads are attributed to the wrong index. Index hopping can result in the loss of data in downstream analyses and the risk of index hopping increases with the level of multiplexing that takes place. UDIs can reduce this risk because there are two distinct sequences that are expected to appear as part of each read for each sample. If those indexes aren’t both present during downstream read processing, then index hopping-effected reads can be filtered out from the data.
Sequencing indexes are important for multiplexing and can be considered markers that distinguish reads between libraries. During NGS library prep, is also the possibility to tag molecules within a single library. This can be done using molecular barcodes called unique molecular identifiers (UMIs). UMIs are short sequences that incorporate a unique barcode onto each molecule within a sample library. Adding a UMI to UDI adapters can help with the identification sequencing errors that were introduced during library prep or sequencing; characterization of low frequency variants; quantification of transcripts, as well as with the removal of PCR duplicates.
Table 1. Summary of differences between types of dual indexes.
Type of dual index | Benefits |
---|---|
Combinatorial dual index (CDI) | Offers higher level of multiplexing than single index, best suited for workflows where index hopping is of minimal concern |
Unique dual index (UDI) | Ideal for highly-multiplexed samples, useful for reducing impact of index hopping |
UDI with unique molecular identifier (UDI-UMI) | Helps with identification of low frequency variants, PCR duplicates, and quantitative NGS approaches |
Adapters and their associated indexes are key to a successful NGS workflow and should be selected carefully. If you’d like to read more about different adapters click here, or if you want to find out more about best practices to avoid index hopping visit the UDI-UMI application page here.
Library quantification and pooling
Library quantification and pooling is the final step of an NGS library prep process prior to sequencing. If you used multiplex-friendly adapters, then you can pool multiple libraries to be sequenced onto the same sequencing flow cell. When pooling libraries it is important that they are mixed in equal concentrations to ensure that sequencing data will be uniform across libraries. Unequal pooling of libraries could result in one library dominating the sequencing run—i.e., one sample will use more of the sequencing reads than the others. An unequal distribution of sequences across samples could result in some samples not getting enough reads for downstream analyses. To avoid this, libraries should be quantified and normalized. Traditional library quantification determines the concentration and average fragment size present in the prepped library.
The concentration of your library can be measured using fluorometric quantification approaches such as Qubit® (Thermo Fisher Scientific). It’s important to keep in mind that since fluorometric quantification measures all the DNA present in a sample, library concentrations can be overestimated, due to the presence of DNA without adapters still being present in the sample. Quantification using qPCR can be more accurate because it will only measure library molecules containing adapter, although it can be costly and time consuming.
In addition to concentration measurements, it is also recommended that library fragment size is measured prior to library pooling. The library fragment size can give you important insights into the overall quality of your prepared library such as ensuring that library molecules fall within the expected size range determined during fragmentation, and confirming there are no dimers which can negatively affect a sequencing run. Fragment size can be measured using microfluidic electrophoresis systems such as a TapeStation® or Bioanalyzer® (Agilent). Once both values have been obtained (concentration and fragment size) for all libraries and libraries have been normalized appropriately, then your libraries are ready to be pooled.
Enzymatic library normalization is an alternative method to the traditional quantification and pooling techniques. This approach relies on enzymatic library normalization chemistry to generate balanced, multiplexed sequencing pools and can be directly incorporated into your library prep kit workflow. Enzymatic normalization can be especially attractive to those preparing a large number of NGS libraries, as it streamlines the quantification and pooling steps by not requiring individual library quantification, thus saving users valuable time. IDT offers an enzymatic library normalization module that uses xGen™ NGS Normalase™ technology. It can be used in combination with xGen Library Prep kits to prepare libraries for direct sequencing applications such as whole genome sequencing (WGS) and whole transcriptome sequencing.
RUO23-2662_001