De novo sequencing can sequence the genome of a species without any reference genome information, splice and assemble it by bioinformatics analysis methods, and obtain the genome sequence map of the species, to promote the follow-up research of the species. It offers reference genome assembly for rarely studied species. Using de novo sequencing to obtain the genomic information of microbes provides a fresh start for exploring the genetic structure and functions, studying the evolutionary origin of microbial populations, as well as developing potential applications of these abundant microbes in medicine, disease, agriculture, and the environment.
Novogene offers de novo sequencing services using both PacBio and Illumina platforms. We provide multifaceted sequencing services including genome survey, draft map, complete map, and fine map tailored to different research needs. For each project, our scientists will design the best sequencing strategy utilizing an optimal combination of short reads and long-range sequencing information to achieve the most comprehensive de novo assembly results for your genome of interest.
For individual research：
For population research：
* NC/QC: NanoDrop concentration/Qubit concentration
Coding gene annotation
Coding gene annotation
The first step of the project workflow involves the sample quality control (Sample QC) to ensure that your samples meet the criteria of the Microbial De novo Sequencing. Then, the appropriate library is prepared according to your target organism and application and subsequently tested for its quality (Library QC). Next, the sequenced sample and the resulting data are also checked for their quality (Data QC). Finally, bioinformatic analyses are performed and publication-ready results are provided. The following flowsheet describes the step-by-step protocol our Microbial De novo Sequencing follows.
Continuous Genomic Surveillance Monitored the In Vivo Evolutionary Trajectories of Vibrio parahaemolyticus and Identified a New Virulent Genotype
mSystems Date: 19 January 2021IF: 6.663DOI: https://doi.org/10.1128/mSystems.01254-20
Whole genome sequence of Diaporthe capsici, a new pathogen of walnut blight
Genomics Date: 23 February 2021IF: 6.205DOI: https://doi.org/10.1016/j.ygeno.2020.04.018
Horizontal Plasmid Transfer Promotes the Dissemination of Asian Acute Hepatopancreatic Necrosis Disease and Provides a Novel Mechanism for Genetic Exchange and Environmental Adaptation
mSystems Date: 17 March 2020IF:6.519DOI: https://10.1128/mSystems.00799-19
A novel bacterial thiosulfate oxidation pathway provides a new clue about the formation of zero-valent sulfur in deep sea
ISME Date: 14 September 2020IF: 9.493DOI: https://10.1038/s41396-020-0684-5
Genomic and transcriptomic analyses reveal differential regulation of diverse terpenoid and polyketides secondary metabolites in Hericium erinaceus
Scientific Reports Date: 31 August 2017IF: 5.228DOI: https://10.1038/s41598-017-10376-0
Polymerase reads are mostly used for quality control of the instrument run. Polymerase read metrics primarily reflect movie length and other run parameters rather than insert size distribution. Polymerase reads are trimmed to include only the high-quality region. Note: Sample quality is a major factor to be considered in polymerase read metrics.
Polymerase read length distribution
The horizontal axis shows polymerase read length distribution, the vertical axis shows the number of reads corresponding to length distribution.
Each polymerase read is partitioned to form one or more subreads, which contain a sequence from a single pass of a polymerase on a single strand of an insert within a SMRTbell template and no adapter sequences. The subreads contain the full set of quality values and kinetic measurements. Subreads are useful for applications such as de novo assembly, resequencing, and base modification analysis.
Subreads read length distribution
The horizontal axis shows subreads length distribution, the vertical axis shows the number of reads corresponding to length distribution.
19G PacBio raw reads were generated and used for assembly. The assembled genome sequence statistics information is shown in table
Assembled Genome Statistics
GO stands for Gene Ontology. The Gene Ontology (GO) project aims to provide reliable descriptions of gene products within several databases. GO vocabularies (ontologies) explain gene products concerning their associated biological processes, molecular functions, and cellular components in a species-independent approach. GO annotation is only available for identified novel genes and isoforms.
Genes were classified into one or several parts of GO by their functions. Relying on the GO annotation results, we could detect gene functions.
The horizontal axis displays the GO function class for the annotated genes, the right vertical axis is the gene number, and the left vertical axis is the percent of gene number annotated in all the coding genes.
KEGG stands for Kyoto Encyclopedia of Genes and Genomes. It is a database resource for understanding high-end functionalities and utilities of the biological system, such as the cell, the organism, and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies. Using KEGG annotation, we could find genes that related to the annotated gene conveniently.
The horizontal axis is the KEGG pathway type, and the vertical axis shows the number of annotated genes.
COG is the abbreviation of Cluster of Orthologous Groups of proteins. It is a protein database created and maintained by NCBI and is based on the evolution relation of the protein systems among bacteria, algae, and eukaryotes. COG annotations can be normally used to determine protein families. All proteins classified in one COG part contain homologous sequences that can be used to deduce the function of a protein.
The horizontal axis is the COG function type, and the vertical axis is the number of annotated genes.
The full name of NR is the Non-Redundant Protein Database. It is a protein database without duplication which is created and maintained by NCBI. The database is complete and the annotation results contain species information that can be used to classify different species.
The horizontal axis is species ID, and the vertical axis is the number of annotated genes.
Sequenced reads were mapped to the assembly results, and then depth and GC content of assembly results were counted to further reflect the distribution of both GC content and the sequencing depth.
X-coordinate is GC content, and Y-coordinate is average depth. The right side is the distribution of sequence depth and the upper side is the distribution of GC content.
Generally, the microbial genome contains profuse functional regions, which account for even more than 90% of its size. In addition to the coding regions, non-coding regions are more likely to affect the regulation function of transcription, post-transcriptional modifications, translation, and epigenetic modifications. Part of the functional regions is related to the diversity of microbial evolution.
The constitution of the sequenced genome was learned using various methods, such as gene prediction, repeats prediction and non-coding RNA prediction, etc. X-coordinate is gene length and Y-coordinate is gene numbers.
*Please contact us to get full demo report.
The field is required.