Using long-read sequencing technology to explore the complex structure of transcripts

Using long-read sequencing technology to explore the complex structure of transcripts

The long-read sequencing technology (also known as third generation sequencing) , which can directly obtain the full-length mRNA sequence and structural information without splicing is the foundation for the full-length transcriptome. It can offer valuable insights into a number of disease-related issues. This long-read sequencing technology has unparalleled advantages in the complex structural analysis since its read length is significantly longer than that of second-generation sequencing technology with short reads strategy. These advantages include the direct acquisition of full-length transcript sequences, the discovery of new genes and new transcripts, as well as the identification of fusion genes. Based on these advantages, full-length transcriptome analyses must be carried out on numerous samples to identify complex transcript information and alternative splicing variants.

In this blog, we will delve into the potential of long-read sequencing and its role in unravelling the complex structures of transcripts.

Table 1. Main application of long-read sequencing in unravelling structures of transcripts

Application	Description
Identification of genetic mutations	Identify genetic mutations that cause disease by detecting variants in gene sequences； Identifying changes in gene expression.
Alternative splicing	Identify alternative splicing events, which may contribute to the emergence of particular diseases
Gene fusion events	Identify Gene fusions evet, which happen when two distinct genes merge together, can be identified
Transcript isoform identification	Identify and characterize multiple isoforms of transcripts, which can provide insights into the function of genes and the regulation of gene expression.
Non-coding RNA identification	Identify and characterize non-coding RNAs, which are involved in gene regulation.

Using long-read sequencing strategy, Mao Zhang et al. found CaMKII-9 as a significant and important CaMKII-isoform that is more prevalent than CaMKII-2 in the human heart. CaMKII-δ9 has a characteristic sequence of exon 13-16-17 combination that interacts with UBET2 and promotes its phosphorylation and degradation, thereby disrupting the FA DNA repair signalling pathway. This leads to cardiac DNA instability, cell death, and heart failure. These findings highlight the CaMKII-induced cardiomyocyte death and cardiac pathology as being caused by the CaMKII-UBE2T-DNA damage signalling pathway, as well as a previously unknown function of CaMKII-9 in genome stability and cell fate regulation. This study’s principal mechanism identifies CaMKII-9 as a potentially significant therapeutic target for cardiac damage and heart failure in mammals, particularly humans.^[1]

The first reference genome assembly of a non-model pochard species is discussed by Mueller Ralf C et al. in this study. By combining the strengths of RNA-Seq and Iso-Seq technologies, this annotation method produces a merged transcriptome with functional annotation and expression profiles, offering insights into gene expression. Owing to alternative splicing, a single gene can have multiple alternative variants (isoforms) and as a consequence can be translated into proteins with different functions. In full-length transcript isoform sequencing (Pacific Biosciences [PacBio] Iso-Seq) of messenger RNA, the result showed that it retained 80.57% (3.84%) full-length non-chimeric (FLNC) reads after error correction, and Minimap2 can map 97.39% (1.34%) of long reads to the reference genome. The comparison of length discovery between the reference transcriptomics of long and short read provides insight into these sequencing technology choices. Although short-read data was enough for annotating protein-coding genes in this study, long-read data retrieved more transcripts per gene and maybe more protein-coding genes that could not be annotated. The researchers expected that as the precision of base calling in long-read sequencing improves, they would use high-coverage long-read sequencing to rebuild the transcriptome. Short-read transcriptome sequencing may eventually become a consumable product. In conclusion, the crested pochard’s genome and transcriptome annotations constitute the basis for further research, such as those on disease response, and the high quality of the dataset for non-model species enables highly accurate resolution. While researching zoonotic pathogen reservoirs, it is crucial to consider genetic variations and similarities among closely related species.^[2]

Tumors display widespread transcriptome changes, but the full picture of transcript-level splicing in cancer is unclear. Full-length transcripts and infer tumor-specific splicing events can be located and annotated using the long-read sequencing platform, such as PacBio system. Application of the long-read sequencing strategy to breast cancer samples identified thousands of previously unannotated transcripts; approximately 30% of novel transcripts affected protein-coding exons and were predicted to alter protein localization and function. To support the transcription and translation of novel transcripts, Veiga, Diogo FT et al extensively cross-validated omics datasets. 3059 breast tumor-specific splicing events were identified, 35 of which were significantly associated with patient survival. Of these, 21 were absent from GENCODE and 10 were enriched in specific breast cancer transcripts. Taken together, the findings demonstrate the complexity, cancer transcript specificity, and clinical relevance of previously unidentified breast cancer transcripts and splicing events. They can only be annotated by long-read RNA studies and also provides a wealth of immuno-oncology therapeutic target resources.^[3]

We now have a better understanding of multi-sample transcriptome research owing to the aforementioned literature. We can obtain clearer transcripts by multi-sample third-generation sequencing for complex isoforms, complex transcripts, and specific transcripts in diseases and cancers. As a trust global sequencing provider, Novogene is pioneer in applying the cutting-edge technology in the delivery of latest genomics services and solutions. To data, Novogene has built a world-leading long-read sequencing laboratory capabilities, including PacBio Sequel II, Sequel IIe and latest Revio systems, as well as Oxford Nanopore PromethION platform, to respond to the need for more responsible sequencing around the world.

Reference.

Zhang, Mao, et al. "CaMKII-δ9 promotes cardiomyopathy through disrupting UBE2T-dependent DNA repair." Nature cell biology 21.9 (2019): 1152-1163.
Mueller, Ralf C., et al. "A high-quality genome and comparison of short-versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)." GigaScience 10.12 (2021): giab081.
Veiga, Diogo FT, et al. "A comprehensive long-read isoform analysis platform and sequencing resource for breast cancer." Science Advances 8.3 (2022): eabg6711.

Using long-read sequencing technology to explore the complex structure of transcripts

More Resources

Online Event

Publications

Downloads

Brochure/Magazine

REACH OUT

Tell Us More About Your Next Project

REACH OUT

Tell Us More About Your Next Project