Gene Set Enrichment Analysis (GSEA) is an important tool in genetic research because it can help researchers identify key biological pathways and processes that are associated with a particular phenotype or disease. GSEA is usually employed in genetic research in the following ways:
Firstly, the statistical methods commonly used in enrichment analysis include cumulative hypergeometric distribution, Fisher’s exact test, etc. Since a large number of tests (multiple tests) are usually performed simultaneously in enrichment analysis, the test results need to be corrected using multiple test correction methods to make the results more accurate. These methods include Bonferroni correction to counteract the multiple comparisons problem and Benjamini-Hochberg Procedure for false discovery rate correction. The use of enrichment analysis methods to do bioinformatics research on gene annotation databases has generated many enrichment analysis tools, such as DAVID online analysis tool, R Cluster-Profiler package, Meta-scape, etc. These tools play an important role in facilitating the analysis of gene function and the study of biological knowledge data generated by high-throughput sequencing technologies.
The most common GSEA methods currently used are based on enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Firstly, various techniques are used to multiply a large number of genes of interest, such as differentially expressed gene sets, gene co-expression networks, protein complex gene clusters, etc. In the next step the nodes in GO or pathways of KEGG that are significantly enriched by these gene sets of interest are searched for. This helps in further in-depth and detailed experimental studies. In summary, enrichment analysis is used to decipher the biological knowledge expressed in a set of genes and reveal their roles inside or outside the cell.
Gene Ontology (GO):
The gene ontology (GO) database is a structured standard biological model built by the GO organization in 2000. This model describes our knowledge of biological domains in three aspects that are cellular components, molecular functions and biological processes. It is one of the most widely used gene annotation systems. Each node in the annotation system is a description of a gene or protein. A strict “parent-child” relationship is maintained between the nodes. Thus, a gene or protein can be annotated from three levels.
Fig 1: GO flowchart
Kyoto Encyclopedia of Genes and Genomes (KEGG)
KEGG is a database for systematic analysis of gene function and genomic information, integrating genomic, biochemical, and phylogenetic information. KEGG is used to understand high-level functions and utilities of the biological system. This database helps researchers study gene and expression of information as a whole. At present, KEGG contains 19 sub-databases. Enrichment analysis is commonly used in KEGG Pathway (It is a collection of manually drawn pathway maps that represent knowledge of the molecular interaction, reaction and relation network). These pathways cover a wide range of biochemical processes.
Fig 2: KEGG flowchart
In conclusion, GO and KEGG are the types of GSEA that are the most frequently used for functional analysis. They are typically the first choice because of their long-standing curation and availability for a wide range of species. They can all be processed through Novomagic’s online tools with just a click.
The field is required.
I agree that Novogene Corporation may use this information to contact me to assist with my request. I understand that all personal information I have submitted will be kept confidential in accordance with Novogene's privacy policy.