Next Generation Sequencing (NGS): A Beginner’s Guide
DNA sequencing development revolutionized biological and medical sciences 50 years ago. Next-generation sequencing is the modern, second revolution and the spearhead of an ever-accelerating field. Here is what you need to know to begin using this tool.
The history of genetic research is rather new. From the presentation of Mendel’s discoveries in the 1860s to the development of the Central Dogma of molecular biology in the 1950s, we came a long way in understanding genetics. However, we still couldn’t tap into the information contained inside the DNA. It was only in the 1970s that the science of genetics as we know it truly started to thrive with the development of Sanger’s chain-termination method, the first-generation sequencing. After the first automated sequencing machine development in 1987, we witnessed a fast-paced and dynamic expansion in genetics and genomics over the next 20 years.
The development of the sequencing by synthesis (SBS) method in the 2000s marked the start of a new era. This method, also known as second-generation sequencing or next-generation sequencing (NGS), allows the massively parallel sequencing of millions of DNA fragments simultaneously. Essentially, sequencing costs[a] dropped by an order of 106 and the sequencing speed increased by up to an order of 106. It is now feasible to confidently and cost-efficiently tackle research avenues that require whole genome sequencing (WGS), resequencing and de novo genome sequencing, transcriptome, and epigenetics studies, of both eukaryotic and prokaryotic organisms.
But how does NGS sequencing work?
The workflow for a typical Illumina NGS has four steps.
- The first step is library preparation.
DNA samples, either extracted DNA or cDNA (in RNA studies), are shredded to 200-500 bp DNA fragments. Then, adaptors are inserted at both ends of the DNA fragments through reduced-cycle amplification. An adaptor has a sequencing binding primer region, an index, and an oligo at the end. The adaptors are specific for the 5’ and the 3’ end, and they differ in their sequencing binding site and the small oligo while sharing the same index. This strategy allows the unequivocal assignment of the fragments to the correct sample in multiple concomitantly sequenced samples, something referred to as multiplexing.
- The second step in the workflow is cluster generation, which clonally amplifies the libraries through bridge amplification.
In this process, the single-stranded library is washed over the flow cell. The first oligo type (at the end of the library) hybridizes to its complementary oligo, fixed in the flow cell. Anchored fragments are called seed DNA templates. Then, a polymerase synthesizes the complementary strand of this template that contains the second oligo type at its end. The double-stranded DNA is denatured, and the original template washed away. The newly synthesized DNA molecule then bends and hybridizes with the second type of oligo in the flow cell, forming a bridge. A complementary strand is synthesized, and the double-strand bridge denatured, resulting in two single-strand copies tethered to the flow cell. This process is repeated many times for millions of DNA templates simultaneously, generating millions of unique clonal clusters of the seed DNA templates. Finally, the reverse strands are cleaved and washed away, and the 3’ ends are then blocked to prevent unwanted priming.
- The third step is sequencing.
A DNA polymerase synthesizes the complementary strand of the DNA template by adding one fluorescently marked dNTP per cycle and registering which base was added through the detected fluorescence color. Since the bases have a reversible terminator 3’ end that causes the polymerase to stop the synthesis until its removal, each cycle only adds one base. Thus, the number of cycles determines the length of the sequenced fragment. Since this process takes place for millions of fragments simultaneously, it is also known as massively parallel sequencing. By the end of forward strand sequencing, the 3′ ends are unblocked, and another cycle of bridge amplification is performed to generate the reverse strands. Forward strands are cleaved and removed, and reverse strands are likewise sequenced. If the sequencing procedure stops at the end of forward strand sequencing, it is called a single-end (SE) sequencing. If the reverse strand is also sequenced, this is called paired-end (PE) sequencing. The results are then stored in FastaQ files.
- The final step is data analysis.
At first, a quality control check removes reads that either contain full or parts of the adapters sequence, >10% of uncertain bases, and reads that have >50% of bases with Qscore<5. Then, sequences are assigned to the correct library by their unique indices. Next, reads with similar stretches of base calls are locally clustered, and forward and reverse reads (if available) are paired, creating contiguous sequences (contigs). Finally, these contigs are aligned and mapped back to the reference genome. From here, different analyses can be applied to meet the research questions of the study.
- The first step is library preparation.
Novogene can help you with a range of NGS applications. Please, contact your regional sales manager to explore more.