Expanding Horizons in Genomic Research with Long-Read Sequencing

IntroductionThe development of long-read sequencing started in 2009 and thirteen years later, in 2022, it was acclaimed as the “method of the year” by the journal Nature Methods. In the rapidly evolving landscape of genomics, third-generation sequencing technologies have emerged as transformative tools, holding the potential to revolutionize the way we decipher and understand genetic information.

Third generation sequencing technologies were developed with the goal to sequence long DNA molecules with single molecule resolution and to eliminate the need for amplification. Instead of fragmenting the DNA into small pieces, long read sequencing methods have the capacity to read long stretches of DNA.

Two technologies currently dominate the long-read sequencing space:

– Single-Molecule Real-Time (SMRT) Sequencing (PacBio)

– Oxford Nanopore Technologies (ONT)

Novogene, a global genome sequencing service provider, offers cutting-edge services using the latest sequencing technologies from PacBio, Oxford Nanopore, and Illumina. Novogene’s expertise spans platforms such as PacBio Sequel II/IIe, the newest PacBio Revio, and the ultra-high throughput ONT PromethION platform. Whether you’re interested in long-read sequencing, metagenomics, or other genomic applications, Novogene is a valuable partner in genomics research.

In this blog, let’s go deeper into the applications of long-read sequencing and the advantages of combing short- and long-read data in the genomic research.

Different applications of long-read sequencingLong-read sequencing has many genomics applications that can be utilized across the full spectrum of biology disciplines, as outlined below:

– Detecting structural variation: short reads perform well for the identification of single nucleotide variants and small insertion and deletions, but they are not suitable for the detection of larger sequence changes, also called structural variants. These include large deletions, insertions, duplications, repeat expansions, inversions or translocations, which can only be detected with long-read sequencing.

– De novo assembly: De novo assembly of a genome is comparable with solving a jigsaw puzzle, where the challenge increases with smaller pieces. Long-read sequencing provides large overlapping reads, simplifying the puzzle and reducing ambiguity in de novo genome assembly.

– Pangenome sequencing: A pangenome represents the entire set of genes within a species and consists of a core genome (the sequences shared between all individuals of the species) and the “dispensable” genome. Assembling and studying pangenomes has shown that relying on a single reference genome for a species is a disadvantage. For example, many genes that are important for field-crop production in plant species are found in the dispensable genome.

– Haplotype determination or phasing: hundreds of loci in the genome have alleles that are methylated differentially according to their parent of origin (imprinted differentially methylated regions (iDMRs)). Long-read sequencing in combination with the measurement of DNA methylation patterns, can be used to determine the parent of origin without the need for parental DNA. This can improve diagnosis of many genetic diseases.

– Bacterial and plasmid genome sequencing: long-read sequencing technologies enable the full-length sequencing of small genomes, such as bacterial and plasmid genomes. Compared to short fragments of next-generation metagenomics, whereas typically yields only incomplete draft genomes, long-read sequencing can obtain more accurate functional annotation information due to its complete prediction of genes, thus assisting in the development of new resources.

– Full-length transcriptome: long reads method enables the end-to-end sequencing of full-length transcripts, addressing the existing challenges in short reads. Longer reads can improve isoform detection, detect more accurately alternative splicing events, and characterize better novel and fusion transcripts.

The best of both worlds: mixing short- and long-read dataThe debate between short-read and long-read sequencing is an ongoing one, since both technologies have their own unique benefits. To achieve the most comprehensive results, an increasing number of researchers are opting for a combined approach on the complementarity of long and short-read sequencing technologies and applications. This strategy allows for getting the most complete picture of the dataset, benefiting from the speed and lower cost of short-read sequencing, while harnessing the unique benefits of long-read sequencing.

For instance, next generation sequencing is extensively used to study microbial communities in diverse habitats like soil, blood, gut and water, eliminating the need for prior cultivation of individual organisms. The two most common methods are 16S/18S/ITS amplicon metagenomic sequencing or shotgun metagenomic sequencing. The cost-effective 16S/18S/ITS amplicon metagenomic sequencing method, a long-standing standard for microbiota profiling, targets a highly variable region among species. This allows for the detection of many common species, but with limited taxonomic resolution. Shotgun metagenomic sequencing involves sequencing random DNA fragments, however, doesn’t allow for de novo assembly and even with deeper short-read shotgun metagenomic sequencing, de novo assembly remains challenging due to the shared genomic regions between strains and repetitive regions.

Long-read sequencing overcomes these limitations by sequencing the full-length rRNA gene, even including the internal transcribed spacer (ITS) region to provide a greater taxonomic resolution. Until recently, the high error rates of long-read sequencing limited its usability in metagenomics, but with PacBio’s high fidelity circular consensus sequencing (HiFi/CCS) this has changed. However, the lower sequencing depth limits the detection of low-abundance species. Hence the combination of long- and -short read sequencing is an effective solution, combining the strengths of both systems.

SummaryLong-read sequencing proves exceptionally valuable for a wide range of applications and the introduction of PacBio’s high fidelity circular consensus sequencing (HiFi/CCS) has effectively mitigated the historically high error rates associated with long-read sequencing. However, cost per sample remains higher and sequencing depth is comparatively lower than that of next generation sequencing. Therefore, by combining both short- and long-read sequencing approaches, researchers can have the best of both worlds.

Novogene offers pre-made library services, short- or long-read sequencing and subsequent bioinformatic analysis to help researchers obtain genomic information with efficiency and accuracy. If you have personized requirements, please make a comment on the request for quote form, and we can discuss in detail.

References:1. Hayden EC. Genome sequencing: the third generation. Nature. 2009;1133.

2. Tang L, Tang, L. Method of the Year 2022: Long-read sequencing. Nat Methods. 2023.

3. Gehrig JL, Portik DM, Driscoll MD, Jackson E, Chakraborty S, Gratalo D, et al. Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microb Genom. 2022;8(3).

4. PacBio. (2024, January 8). Sequencing technologies – PacBio. https://www.pacb.com/technology/

Genome Sequencing

Metagenome Sequencing

Transcriptome Sequencing

Single Cell Sequencing

FFPE RNA Sequencing

Epigenome Sequencing

Pre-made Library Sequencing

Protein & Metabolite Profiling

Expanding Horizons in Genomic Research with Long-Read Sequencing

More Resources

Online Event

Publications

Downloads

Brochure/Magazine

REACH OUT

Tell Us More About Your Next Project

REACH OUT

Tell Us More About Your Next Project